Commits


Weston Pace authored and David Li committed 2b2eeeb7ef7
ARROW-12289: [C++] Create basic AsyncScanner implementation Adds a naive implementation of `AsyncScanner` which is different from `SyncScanner` in a few ways: * It does not use `ScanTask` and instead relies on `Fragment::ScanBatchesAsync` which returns `RecordBatchGenerator`. * It does an unordered scan by default (i.e. batches from file N may arrive before all batches from file N-1 have arrived) and can order it if asked for * It uses the unordered scan for `ToTable`. It is "naive" because this PR does not add a complete implementation for `FileFragment::ScanBatchesAsync`. This method relies on `FileFormat::ScanBatchesAsync` (in the same way that `FileFragment::Scan` relies on `FileFormat::ScanFile`). This method (`FileFormat::ScanBatchesAsync`) _should_ be overridden in each of the formats (to rely on an async reader) but it is not (yet). As a result, the performance for `AsyncScanner` is poor since it does not do any "per-file" parallelism nor does it do any "per-batch" parallelism. Follow-up tasks are ARROW-12355 (CSV), ARROW-11772 (IPC), ARROW-11843 (Parquet) In addition, this PR is built on top of ARROW-12287 so that will need to be merged first. It will also need to rebase changes from ARROW-12161 and ARROW-11797. Closes #10008 from westonpace/feature/arrow-12289 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>