Commits


Wes McKinney authored and Antoine Pitrou committed c07486c29f9
ARROW-5760: [C++] New compute::Take implementation for better performance, faster dispatch, smaller code size / faster compilation I was going to work on Filter in this PR but it got to be too much to review so I'll tackle that in a separate PR. Summary of what's in this PR: * Uses BitBlockCounter to speed up processing of mostly-not-null indices vectors * Performance of primitive takes improved ~2-3x across the board. Builder classes are no longer used for primitive types. * List-based takes seem about 4x faster based on the (limited) benchmarks * Size of vector_take.cc.o reduced from 5.9 MB to 495 KB on -O3 build with clang-8. Compilation time is correspondingly reduced. The refactor for the new kernels framework I think had increased the code size in this module beyond what was in 0.17.x. * Kernels for primitive types of same size are reused * Signed/unsigned indices are processed using unsigned integer code paths after they've been boundschecked (so we know that the signed ints have no negative values) * Adds new kernel input type checking rules so that the number of registered take kernels has gone from over 300 to only 13. This means faster dispatching, too. * Does vectorized boundschecking * Take no longer uses kernels/vector_selection_internal.h, but it's still used by Filter. I plan to delete it after completing the new Filter implementation * Fixes to the prior benchmarking PR Note: support for doing Take with unions has been temporarily disabled. Since there is so little code in the codebase that deals with unions at the moment I felt it would be better to address this in follow up work. Here are the currently available kernels: ``` [VectorKernel<(array[primitive], array[integer]) -> computed>, VectorKernel<(array[binary-like], array[integer]) -> computed>, VectorKernel<(array[large-binary-like], array[integer]) -> computed>, VectorKernel<(array[Type::FIXED_SIZE_BINARY], array[integer]) -> computed>, VectorKernel<(array[null], array[integer]) -> computed>, VectorKernel<(array[Type::DECIMAL], array[integer]) -> computed>, VectorKernel<(array[Type::DICTIONARY], array[integer]) -> computed>, VectorKernel<(array[Type::EXTENSION], array[integer]) -> computed>, VectorKernel<(array[Type::LIST], array[integer]) -> computed>, VectorKernel<(array[Type::LARGE_LIST], array[integer]) -> computed>, VectorKernel<(array[Type::FIXED_SIZE_LIST], array[integer]) -> computed>, VectorKernel<(array[Type::STRUCT], array[integer]) -> computed>, VectorKernel<(array[Type::MAP], array[integer]) -> computed>] ``` There's some code duplication that can be improved so I think things can be improved in follow up PRs using the benchmarks and code size metrics as a strict guideline for making changes. Closes #7382 from wesm/ARROW-5760 Lead-authored-by: Wes McKinney <wesm@apache.org> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>