Commits


Nishanth Thimmegowda authored and GitHub committed 46601808486
ARROW-17450 : [C++][Parquet] Support RLE decode for boolean datatype (#14147) Currently, parquet-cpp does not support columns encoded with RLE. Although the users of RLE are quite sparse with uses of one of the 3 types [Repetition and definition levels, dictionary indices and boolean values in data pages], [Parquet-encodings](https://parquet.apache.org/docs/file-format/data-pages/encodings/). Some implementations do encode this directly on boolean columns (Athena on AWS). Even though there is encoding and decoding support for repetition and definition levels, there is no support for boolean column with RLE. This PR integrates the column scanning to support columns with RLE. The first 4 bytes of the data length are size of the encoded data, which is parsed first and then passes to decoder. Added two tests with RLE boolean encoded parquet file to validate that values can be parsed individually and in a batch. Lead-authored-by: Nishanth Thimmegowda <nishanth.thimmegowda@snowflake.com> Co-authored-by: sfc-gh-nthimmegowda <nishanth.thimmegowda@snowflake.com> Co-authored-by: Ubuntu <ubuntu@ip-172-29-51-62.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-29-51-79.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-29-51-6.us-west-2.compute.internal> Signed-off-by: Sutou Kouhei <kou@clear-code.com>