Commits


Heres, Daniel authored and Jorge C. Leitao committed 2696951338c
ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups Based on https://github.com/apache/arrow/pull/9234, this PR improves the situation described in https://issues.apache.org/jira/browse/ARROW-11300. The current situation is that we call `take` on arrays, which is fine, but causes a lot of small `Arrays` to be created / allocated. when we have only a small number of rows in each group. This improves the results on the group by queries on db-benchmark: PR: ``` q1 took 32 ms q2 took 422 ms q3 took 3468 ms q4 took 44 ms q5 took 3166 ms q7 took 3081 ms ``` https://github.com/apache/arrow/pull/9234 (different results from that PR description as this has now partitioning enabled and a custom allocator) ``` q1 took 34 ms q2 took 389 ms q3 took 4590 ms q4 took 47 ms q5 took 5152 ms q7 took 3941 ms ``` The PR changes the algorithm to: * Create indices / offsets of all keys / indices new in the batch. * `take` the arrays based on indices in one go (so it only requires one bigger allocation for each array) * Use `slice` based on the offsets to take values from the arrays and pass it to the accumulators. Closes #9271 from Dandandan/hash_agg_few_rows Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>