Commits


David Li authored and Wes McKinney committed bcd2e946e03
PARQUET-1820: [C++] pre-buffer specified columns of row group This hooks up Antoine's read coalescing implementation to the Parquet reader; it takes into account row groups and column indices, so it should be good for both scans of the entire file and selecting a few columns out of many. It also exposes some options on the Parquet ReaderProperties to control this. (Is exposing Arrow types like that ok, or should I wrap things?) I'll have benchmarks later. It seems a clear win locally and against remote S3, but from EC2->S3 when I initially tried it was worse. I believe it's because the "naive" read happened to be near optimal on the particular dataset tested. Marking this WIP as I'd like to get feedback on the approach. I believe this subsumes PARQUET-1698/#6138. This is not exposed yet to Python or Datasets. Closes #6744 from lidavidm/parquet-1698-coalesce-reads Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>