Commits


Yordan Pavlov authored and Andy Grove committed 62dfa114d66
ARROW-9523 [Rust] Improve filter kernel performance The filter kernel located here https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/filter.rs currently has the following performance: filter old u8 low selectivity time: [1.7782 ms 1.7790 ms 1.7801 ms] filter old u8 high selectivity time: [815.58 us 816.58 us 817.57 us] filter old u8 w NULLs low selectivity time: [1.8131 ms 1.8231 ms 1.8336 ms] filter old u8 w NULLs high selectivity time: [817.41 us 820.01 us 823.05 us] I have been working on a new implementation which performs between approximately 17 and 550 times faster depending mostly on filter selectivity. Here are the benchmark results: filter u8 low selectivity time: [107.26 us 108.24 us 109.58 us] filter u8 high selectivity time: [4.7854 us 4.8050 us 4.8276 us] filter context u8 low selectivity time: [102.59 us 102.93 us 103.38 us] filter context u8 high selectivity time: [1.4709 us 1.4760 us 1.4823 us] filter context u8 w NULLs low selectivity time: [130.48 us 131.00 us 131.65 us] filter context u8 w NULLs high selectivity time: [2.0520 us 2.0818 us 2.1137 us] filter context f32 low selectivity time: [117.26 us 118.58 us 120.13 us] filter context f32 high selectivity time: [1.7895 us 1.7919 us 1.7942 us] This new implementation is based on a few key ideas: (1) if the data array being filtered doesn't have a null bitmap, no time should be wasted to copy or create a null bitmap in the resulting filtered data array - this is implemented using the CopyNullBit trait which has a no-op implementation and an actual implementation (2) when the filter is highly selective, e.g. only a small number of values from the data array are selected, the filter implementation should efficiently skip entire batches of 0s in the filter array - this is implemented by transmuting the filter array to u64 which allows to quickly check and skip entire batches of 64 bits (3) when an entire record batch is filtered, any computation which only depends on the filter array is done once and then shared for filtering all the data arrays in the record batch - this is implemented using the FilterContext struct This pull request also implements support for filtering dictionary arrays. @paddyhoran @andygrove Closes #7798 from yordan-pavlov/improve_filter_kernel_perf Lead-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: Yordan Pavlov <64363766+yordan-pavlov@users.noreply.github.com> Signed-off-by: Andy Grove <andygrove@nvidia.com>