Public / arrow / c07486c29f9

Commits

Wes McKinney authored and Antoine Pitrou committed c07486c29f912 Jun 2020
ARROW-5760: [C++] New compute::Take implementation for better performance, faster dispatch, smaller code size / faster compilation

I was going to work on Filter in this PR but it got to be too much to review so I'll tackle that in a separate PR.

Summary of what's in this PR:

* Uses BitBlockCounter to speed up processing of mostly-not-null indices vectors
* Performance of primitive takes improved ~2-3x across the board. Builder classes are no longer used for primitive types.
* List-based takes seem about 4x faster based on the (limited) benchmarks
* Size of vector_take.cc.o reduced from 5.9 MB to 495 KB on -O3 build with clang-8. Compilation time is correspondingly reduced. The refactor for the new kernels framework I think had increased the code size in this module beyond what was in 0.17.x.
* Kernels for primitive types of same size are reused
* Signed/unsigned indices are processed using unsigned integer code paths after they've been boundschecked (so we know that the signed ints have no negative values)
* Adds new kernel input type checking rules so that the number of registered take kernels has gone from over 300 to only 13. This means faster dispatching, too.
* Does vectorized boundschecking
* Take no longer uses kernels/vector_selection_internal.h, but it's still used by Filter. I plan to delete it after completing the new Filter implementation
* Fixes to the prior benchmarking PR

Note: support for doing Take with unions has been temporarily disabled. Since there is so little code in the codebase that deals with unions at the moment I felt it would be better to address this in follow up work. Here are the currently available kernels:

```
[VectorKernel<(array[primitive], array[integer]) -> computed>,
 VectorKernel<(array[binary-like], array[integer]) -> computed>,
 VectorKernel<(array[large-binary-like], array[integer]) -> computed>,
 VectorKernel<(array[Type::FIXED_SIZE_BINARY], array[integer]) -> computed>,
 VectorKernel<(array[null], array[integer]) -> computed>,
 VectorKernel<(array[Type::DECIMAL], array[integer]) -> computed>,
 VectorKernel<(array[Type::DICTIONARY], array[integer]) -> computed>,
 VectorKernel<(array[Type::EXTENSION], array[integer]) -> computed>,
 VectorKernel<(array[Type::LIST], array[integer]) -> computed>,
 VectorKernel<(array[Type::LARGE_LIST], array[integer]) -> computed>,
 VectorKernel<(array[Type::FIXED_SIZE_LIST], array[integer]) -> computed>,
 VectorKernel<(array[Type::STRUCT], array[integer]) -> computed>,
 VectorKernel<(array[Type::MAP], array[integer]) -> computed>]
```

There's some code duplication that can be improved so I think things can be improved in follow up PRs using the benchmarks and code size metrics as a strict guideline for making changes.

Closes #7382 from wesm/ARROW-5760

Lead-authored-by: Wes McKinney <wesm@apache.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>