Commits

Wes McKinney authored f609298f8f0
ARROW-7080: [C++][Parquet] Read and write "field_id" attribute in Parquet files, propagate to Arrow field metadata. Assorted additional changes The `field_id` is used for schema evolution and other things. It is surfaced in Python in the `Field.metadata` as `b'PARQUET:field_id'` * `ChunkedArray::Equals` would fail if a child field had unequal metadata, now it does not check the metadata * Improved diffing output in AssertTablesEqual in testing/gtest_util.h (may need some more tests around this) * Added a generic binary ChunkedArray iterator (see `internal::MultipleChunkIterator`) and helpful applicator `internal::ApplyToChunkOverlaps`. I retrofitted `ChunkedArray::Equals` to use this (needed it to improve the diffing output in AssertTablesEqual) * Add `KeyValueMetadata::Merge` method * Add `Field::WithMergedMetadata` method that calls `KeyValueMetadata::Merge` * Print metadata in `Field::ToString` * Add `parquet.ParquetFile.schema_arrow` property to return the effective Arrow schema * Print field_ids in `parquet::SchemaPrinter` This also adds a flag `print_metadata` to `Field::ToString` and `Schema::ToString` with default `false` whether to print out the key value metadata, per ARROW-7063. I figure it's OK to merge this change and then decide whether we want to keep it like that before releasing the software Closes #6408 from wesm/ARROW-7080 and squashes the following commits: e0c7396fd <Yosuke Shiro> Fix test cases 239932c30 <Wes McKinney> Remove field metadata outputs from GLib unit test 03f2f185c <Wes McKinney> Add print_metadata option to Field::ToString / Schema::ToString and use expect_equivalent in R unit tests 169f274d2 <Yosuke Shiro> Use check_metadata instead of metadata 7b1f5a929 <Yosuke Shiro> Use true as the default argument 222af577f <Yosuke Shiro> Fix document of garrow_table_equal() 14fde5755 <Yosuke Shiro> Add metadata parameter instead of using true 45f0c7954 <Yosuke Shiro> Fix schema equality check 0ce996ebd <Wes McKinney> export internal::MultipleChunkIterator 2c3f3ac80 <Wes McKinney> Correct inconsistent comments about null field_id's 6e3bdfd1b <Wes McKinney> Fix dataset Parquet unit tests fd099f961 <Wes McKinney> Code review comments f22076767 <Wes McKinney> Start working on properly preserving and deserializing field_id in C++. Some field_id round trips working Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Yosuke Shiro <yosuke.shiro615@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>