Commits

Wes McKinney authored d0f3b5f3c74
ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size NOTE: the diff is artificially larger due to some code rearranging (that was necessitated because of how some data selection code is shared between the Take and Filter implementations). Summary: * Filter is now 1.5-10+x faster across the board, most notably on primitive types with very high selectivity or very low selectivity filters. The BitBlockCounters do a lot of the heavy lifting in that case but even in the worst case scenario when the block counters never encounter a "full" block, this is still consistently faster. * Total -O3 code size for **both** Take and Filter is now about 600KB. That's down from about 8MB total prior to this patch and ARROW-5760 Some incidental changes: * Implemented a fast conversion from boolean filter to take indices (aka "selection vector"), `compute::internal::GetTakeIndices`. I have also altered the implementation of filtering a record batch to use this, which should be faster (it would be good to have some benchmarks to confirm this). * Various expansions to the BitBlockCounter classes that I needed to support this work * Fixed a bug ARROW-9142 with RandomArrayGenerator::Boolean. The probability parameter was being interpreted as the probability of a false value rather than the probability of a true. IIUC with Bernoulli distributions, the probability specified is P(X = 1) not P(X = 0). Please someone confirm this. Closes #7442 from wesm/ARROW-9075 Authored-by: Wes McKinney <wesm@apache.org> Signed-off-by: Wes McKinney <wesm@apache.org>