Public / arrow / e2f1e0107c5

Commits

Jorge C. Leitao authored and Andrew Lamb committed e2f1e0107c514 Nov 2020
ARROW-10366: [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate

This PR is based on https://github.com/apache/arrow/pull/8503 from @jorgecarleitao

This makes `merge` send batches to a receiver stream as they arrive and, thereby removing the need to wait for each thread to finish collecting all its batches. It improves reported performance in my laptop on the aggregate_query_sql microbenchmark by 5-15%.

It also includes a change to the aggregate code that allows the aggregates to produce output even if there are no results on the first call to `poll_next`. The change in GroupBy is needed because when we initially tried to run Merge in parallel, we discovered no output was created if the inputs had not completed producing all their values *before* execution began. There is much more discussion on the topic here: https://github.com/apache/arrow/pull/8503#discussion_r510061337. I will add additional comments on the changes inline below.

# Performance

I measured Aggregate Performance with the following command
```
git checkout master && \
    cargo bench --bench aggregate_query_sql && \
    git checkout alamb/ARROW-10366-collect-in-parallel && \
    cargo bench --bench aggregate_query_sql
```

<details>
  <summary>Click for performance details</summary>

aggregate_query_no_group_by 15 12
                        time:   [715.38 us 717.64 us 720.07 us]
                        change: [-5.6870% -4.7352% -3.6834%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

Benchmarking aggregate_query_group_by 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 13.7s or reduce sample count to 40.
aggregate_query_group_by 15 12
                        time:   [2.4874 ms 2.5029 ms 2.5173 ms]
                        change: [-13.513% -12.381% -11.184%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking aggregate_query_group_by_with_filter 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.2s or reduce sample count to 40.
aggregate_query_group_by_with_filter 15 12
                        time:   [1.9724 ms 1.9856 ms 1.9982 ms]
                        change: [-14.759% -10.342% -6.5522%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

</details>

Closes #8553 from alamb/alamb/ARROW-10366-collect-in-parallel

Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>