Commits

Micah Kornfield authored defcf7d188c
ARROW-13151: [C++][Parquet] Propagate schema changes from selection all the way up the stack Previously, structs would only work for a single level, and other nested types did not do this at all. This PR propagates types through lists, large_lists, fixed_sized_lists and multiple nesting of structs. Maps had some special handling: 1. If only a partial key is selected it gets converted to list<struct> 2. If only a partial value is selected it maintains the map type. 3. The old behavior of changing to list<struct> is maintained when the entire key or value are removed (but test coverage was added for this) The example given in the JIRA now works: ``` >>> data = {"root": [[\{"addr": {"this": 3, "that": 3}}]]} >>> table = pa.Table.from_pydict(data) >>> pq.write_table(table, "/tmp/table.parquet") >>> file = pq.ParquetFile("/tmp/table.parquet") >>> array = file.read(["root.list.item.addr.that"]) >>> array pyarrow.Table root: list<item: struct<addr: struct<that: int64>>> child 0, item: struct<addr: struct<that: int64>> child 0, addr: struct<that: int64> child 0, that: int64 ---- root: [[ – is_valid: all not null – child 0 type: struct<that: int64> – is_valid: all not null – child 0 type: int64 [ 3 ]]] ``` Closes #11351 from emkornfield/fix_parquet_type_filter Lead-authored-by: Micah Kornfield <micahk@google.com> Co-authored-by: emkornfield <emkornfield@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>