Commits

Wes McKinney authored 06fd2da5e8e
ARROW-6077: [C++][Parquet] Build Arrow "schema tree" from Parquet schema to help with nested data implementation Introduces auxiliary internal `SchemaManifest` and `SchemaField` data structures. This also permits dictionary-encoded subfields in a slightly more principled way (the dictionary type creation is resolved one time, so this removes the `FixSchema` hacks that were there before). I rewrote the nested schema conversion logic to hopefully be slightly easier to follow though it could still use some work. I added comments within to explain the 3 different styles of list encoding There are a couple of API changes: * The `FileReader::GetSchema(indices, &schema)` method has been removed. The way that "projected" schemas were being constructed was pretty hacky, and this function is non-essential to the operation of the class. I had to remove bindings in the GLib and R libraries for this function, but as far as I can tell these bindings were non-essential to operation, and were added only because the function was there to wrap. * Added `FileWriter::Make` factory method, making constructor private This patch was pretty unpleasant to do -- it removes some hacky functions used to create Arrow fields with leaf nodes trimmed. There is little functional change; it is an attempt to bring a cleaner structure for full-fledged nested data reading I'm going to get on with seeing through user-facing dictionary-encoding functionality in Python Closes #4971 from wesm/parquet-arrow-schema-tree and squashes the following commits: e1f19c06b <Wes McKinney> Code review feedback e2c117ad1 <Wes McKinney> Factor out list nesting into helper function Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>