Commits


Jörn Horstmann authored and Neville Dipale committed 5666dcaf524
ARROW-10079: [Rust] Benchmark and improve count bits This refactors the calculation of null counts by using the bitchunk iterator and `count_ones` intrinsic. The performance of array creation improves by between 3%-5%. The biggest impact is on getting a slice of an existing array, for a slice length of 2048 the performance in a microbenchmark doubles. Benchmark results on a Ryzen 3700U. LLVM seems to be able to vectorize the `count_ones` intrinsic, so performance on machines with better AVX units should be even higher. ``` Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/array_slice-d438c5aeed9bef19 Gnuplot not found, using plotters backend array_slice 128 time: [150.68 ns 151.74 ns 153.14 ns] change: [-11.525% -10.357% -9.3284%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe array_slice 512 time: [158.77 ns 161.03 ns 163.62 ns] change: [-25.922% -24.956% -23.945%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe array_slice 2048 time: [170.17 ns 171.24 ns 172.60 ns] change: [-50.388% -49.865% -49.375%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe ``` ``` Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/array_from_vec-8a972c208e7a6334 Gnuplot not found, using plotters backend array_from_vec 128 time: [750.62 ns 751.69 ns 752.85 ns] change: [-8.2512% -7.3658% -6.5917%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe array_from_vec 256 time: [1.2501 us 1.2580 us 1.2676 us] change: [-0.9517% +0.1364% +1.1850%] (p = 0.81 > 0.05) No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 7 (7.00%) high mild 4 (4.00%) high severe array_from_vec 512 time: [2.1603 us 2.1643 us 2.1690 us] change: [-2.8122% -2.1984% -1.6379%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 10 (10.00%) high mild 3 (3.00%) high severe array_string_from_vec 128 time: [3.2196 us 3.2288 us 3.2395 us] change: [+1.4107% +1.9839% +2.5419%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe array_string_from_vec 256 time: [4.8112 us 4.8352 us 4.8685 us] change: [-5.2564% -3.8312% -2.7672%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe array_string_from_vec 512 time: [8.2588 us 8.2802 us 8.3049 us] change: [-0.5974% -0.2991% +0.0027%] (p = 0.05 > 0.05) No change in performance detected. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe struct_array_from_vec 128 time: [4.5771 us 4.5947 us 4.6206 us] change: [-6.6706% -5.3873% -4.2340%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) high mild 10 (10.00%) high severe struct_array_from_vec 256 time: [6.7617 us 6.7887 us 6.8182 us] change: [-4.9262% -4.2419% -3.6528%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild struct_array_from_vec 512 time: [9.8926 us 9.9639 us 10.052 us] change: [-5.4057% -3.9331% -2.5242%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe struct_array_from_vec 1024 time: [15.812 us 15.862 us 15.922 us] change: [-6.1966% -5.6634% -5.0338%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) high mild 8 (8.00%) high severe ``` ``` Running /home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/builder-26beb41d8ad91167 Gnuplot not found, using plotters backend bench_primitive time: [4.8277 ms 4.8879 ms 4.9558 ms] thrpt: [807.14 MiB/s 818.35 MiB/s 828.55 MiB/s] change: time: [-5.3415% -3.4401% -1.3973%] (p = 0.00 < 0.05) thrpt: [+1.4171% +3.5626% +5.6429%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe bench_bool time: [2.5176 ms 2.5230 ms 2.5292 ms] thrpt: [197.69 MiB/s 198.18 MiB/s 198.60 MiB/s] change: time: [-19.412% -19.203% -18.988%] (p = 0.00 < 0.05) thrpt: [+23.438% +23.767% +24.088%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe ``` Closes #8663 from jhorstmann/ARROW-10079-benchmark-and-improve-count-bits Authored-by: Jörn Horstmann <git@jhorstmann.net> Signed-off-by: Neville Dipale <nevilledips@gmail.com>