Commits

Benjamin Kietzman authored e2440a3b1ed
ARROW-11591: [C++][Compute] Grouped aggregation This patch adds basic building blocks for grouped aggregation: - `Grouper` for producing integer arrays encoding group id from batches of keys - `HashAggregateKernel` for consuming batches of arguments and group ids, updating internal sums/counts/... For testing purposes, a one-shot grouped aggregation function is provided: ```c++ std::shared_ptr<arrow::Array> needs_sum = ...; std::shared_ptr<arrow::Array> needs_min_max = ...; std::shared_ptr<arrow::Array> key_0 = ...; std::shared_ptr<arrow::Array> key_1 = ...; ARROW_ASSIGN_OR_RAISE(arrow::Datum out, arrow::compute::internal::GroupBy({ needs_sum, needs_min_max, }, { key_0, key_1, }, { {"sum", nullptr}, // first argument will be summed {"min_max", &min_max_options}, // second argument's extrema will be found })); // Unpack struct array result (a four-field array) auto out_array = out.array_as<StructArray>(); std::shared_ptr<arrow::Array> sums = out_array->field(0); std::shared_ptr<arrow::Array> mins_and_maxes = out_array->field(1); std::shared_ptr<arrow::Array> group_key_0 = out_array->field(2); std::shared_ptr<arrow::Array> group_key_1 = out_array->field(3); ``` Closes #9621 from bkietz/groupby1 Lead-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: michalursa <michal@ursacomputing.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>