Public / arrow / 270416b0d07

Commits

Weston Pace authored and David Li committed 270416b0d0714 Jan 2022
ARROW-15318: [C++][Python] Regression reading partition keys of large batches.

Since only partition keys were selected we ended up reading 0 columns from the parquet file (we still need to do this so we can determine the row group sizes to accurately reflect the files, or at least we would still need to determine the total number of rows in each file).

We recently added behavior to the parquet reader to respect a batch size parameter.  So if the row group is larger than the batch size we chop the table up into smaller batches using a TableBatchReader with a max chunksize.  There was a bug in the TableBatchReader so that if there were no columns and the max chunksize was smaller than the size of the table (and not evenly divisible into the table size) then we would hit an infinite loop.

Closes #12147 from westonpace/bugfix/ARROW-15318--regression-reading-keys-large-batches

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>