Commits


Micah Kornfield authored and Antoine Pitrou committed 44f3de2c285
ARROW-8494: [C++][Parquet] Full support for reading mixed list and structs Also: ARROW-9810 (generalize rep/def level conversion to list lengths/bitmaps) This adds helper methods for reconstructing all necessary metadata for arrow types. For now this doesn't handle null_slot_usage (i.e. children of FixedSizeList), it throws exceptions when nulls are encountered in this case. It uses there for generic reconstruction. The unit tests demonstrate how to use the helper methods in combination with LevelInfo (generated from parquet/arrow/schema.h) to reconstruct the metadata. The arrow reader.cc is now rewritten to use these method. - Refactors necessary APIs to use LevelInfo and makes use of them in column_reader - Adds implementations for reconstructing list validity bitmaps (uses rep/def levels) - Adds implementations for reconstruction list lengths (uses rep/def levels.). - Adds dynamic dispatch for level comparison algorithms for AVX2 and BMI2. - Adds a pextract alternative that uses BitRunReader that can be used as a fallback. - Fixes some bugs in detailed reconstruction to array tests. Closes #8177 from emkornfield/rep_def_all Lead-authored-by: Micah Kornfield <emkornfield@gmail.com> Co-authored-by: emkornfield <micahk@google.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>