Commits

Wes McKinney authored 73f94c93d7e
ARROW-3762: [C++/Python] Support reading Parquet BYTE_ARRAY columns containing over 2GB of data This patch ended up being a bit more of a bloodbath than I planned: please accept my apologies. Associated changes in this patch: * Split up builder.h/builder.cc into a new arrow/array directory. Public arrow/builder.h API preserved. I think this code is going to keep growing more specialized components, so I think we should get out ahead of it by having a subdirectory to contain files related to implementation details * Implement ChunkedBinaryBuilder, ChunkedStringBuilder classes, add tests and benchmarks * Deprecate parquet::arrow methods returning Array * Allow implicit construction of Datum from its variant types (makes for a lot nicer syntax) As far as what code to review, focus efforts on * src/parquet/arrow * src/arrow/array/builder_binary.h/cc, array-binary-test.cc, builder-benchmark * src/arrow/compute changes * Python changes I'm going to tackle ARROW-2970 which should not be complicated after this patch; I will submit that as a PR after this is reviews and merged. Author: Wes McKinney <wesm+git@apache.org> Closes #3171 from wesm/ARROW-3762 and squashes the following commits: 822451280 <Wes McKinney> Fix int conversion warning on Windows 695ffc9df <Wes McKinney> Remove unimplemented and unused ChunkedBinaryBuilder ctor 5a525115c <Wes McKinney> Use strnlen to compute string length. Inline BinaryBuilder::AppendNextOffset b90eb4b71 <Wes McKinney> Restore sstream include to pretty_print.cc 3669201be <Wes McKinney> Fix deprecated API use 5fdbbb261 <Wes McKinney> Rename columnar/ directory to array/ 8ffaec1ef <Wes McKinney> Address preliminary code comments. Check in missing files 81e787c69 <Wes McKinney> Fix up Python bindings, unit test 2efae064c <Wes McKinney> Finish scaffolding. Get fully compiling again and original parquet-arrow test suite passing 3d075e4aa <Wes McKinney> Additional refactoring to make things chunked. Allow implicit construction of arrow::compute::Datum 922811278 <Wes McKinney> More refactoring 716322377 <Wes McKinney> Split up builder.h, builder.cc into smaller headers, compilation units. add failing test case for ARROW-3762. Add ChunkedBinaryBuilder, make BinaryBuilder Append methods inline