Commits


Li Jin authored and GitHub committed 8b5919d8861
GH-35515: [C++][Python] Add non decomposable aggregation UDF (#35514) ### Rationale for this change Non decomposable aggregation is aggregation that cannot be split into consume/merge/finalize. This is often when the logic rewritten with external python libraries (numpy, pandas, statmodels, etc) and those either cannot be decomposed or not worthy the effect (these are often one-off function instead of reusable one). This PR implements the support for non decomposable aggregation UDFs. The major issue with non decomposable UDF is that the UDF needs to see all data at once, unlike scalar UDF where UDF only needs to see a batch at a time. This makes non decomposable not so useful as it is same as collect all the data to a pd.DataFrame and apply the UDF on it. However, one very application of non decomposable UDF is with segmented aggregation. To refresh, segmented aggregation works on ordered data and passed one logic chunk at a time (e.g., all data with the same date). With segmented aggregation and non decomposable aggregation UDF, the user can apply any custom aggregation logic over large stream of ordered data, with the memory overhead of a single segment. ### What changes are included in this PR? This PR is currently WIP and not ready for review. So far I have implemented the minimal amount of code to make a basic test working but needs clean up, error handling etc. * [x] First round of self review * [x] Second round of self review * [x] Implement and test unary * [x] Implement and test varargs * [x] Implement and test Acero support with segmented aggregation ### Are these changes tested? Added new test calling with compute and acero. The compute tests calls the aggregation on the full array. The acero test callings the aggregation with segmented aggregation. ### Are there any user-facing changes? * Closes: #35515 Lead-authored-by: Li Jin <ice.xelloss@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Li Jin <ice.xelloss@gmail.com>