Commits


Felipe Oliveira Carvalho authored and GitHub committed 475b5b9463b
GH-35749: [C++] Handle run-end encoded filters in compute kernels (#35750) ### Rationale for this change Boolean arrays (bitmaps) used to represent filters in Arrow take 1 bit per boolean value. If the filter contains long runs, the filter can be run-end encoded and save even more memory. Using POPCNT, a bitmap can be scanned efficiently for <64 runs of logical values, but a run-end encoded array gives the lengths of the run directly and go beyond word size per run. These two observations make the case that, for the right dataset, REE filters can be more efficiently processed in compute kernels. ### What changes are included in this PR? - [x] `GetFilterOutputSize` can count number of emits from a REE filter - [x] `GetTakeIndices` can produce an array of logical indices from a REE filter - [x] `"array_filter"` can handle REE filters ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #35749 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>