Commits

Jorge C. Leitao authored 2816f37ff01
ARROW-10812: [Rust] Make BooleanArray not a PrimitiveArray This PR creates a new struct `BooleanArray`, that replaces `PrimitiveArray<BooleanType>`, so that we do not have to consider the differences between being bit-packed and non-bit packed. This difference is causing a significant performance degradation described on ARROW-10453 and #8837 . This usage of different logic is already observed in most of our kernels, as the code for byte-width and bit-packed is almost always different, due to how offsets are computed. With this PR, that offset computation no longer depends on bit-packed vs non-bit-packed. IMPORTANT: this removed support from Boolean array to UnionArray, as `UnionArray` currently only supports `PrimitiveType`. Micro benchmarks (worse to best, statistically insignificant ignored): | benchmark | variation | |-------------- | -------------- | | min nulls 512 | 33.7 | | record_batches_to_csv | 23.1 | | array_string_from_vec 256 | 5.6 | | array_string_from_vec 512 | 5.2 | | take bool nulls 512 | 4.9 | | cast int32 to int64 512 | 2.5 | | equal_512 | 2.3 | | filter u8 very low selectivity | 2.2 | | array_slice 512 | 2.1 | | take bool nulls 1024 | 2.0 | | cast int64 to int32 512 | 1.6 | | min 512 | 1.6 | | take i32 512 | 1.1 | | add 512 | 1.1 | | array_slice 2048 | 1.0 | | length | 1.0 | | filter u8 low selectivity | 0.9 | | filter u8 high selectivity | 0.9 | | array_string_from_vec 128 | 0.9 | | cast int32 to float64 512 | 0.9 | | cast timestamp_ms to i64 512 | 0.8 | | take str null indices 512 | 0.6 | | sum 512 | 0.4 | | filter context u8 very low selectivity | -0.7 | | take i32 1024 | -0.9 | | filter context f32 very low selectivity | -0.9 | | cast float64 to float32 512 | -1.0 | | equal_nulls_512 | -1.0 | | cast time32s to time32ms 512 | -1.1 | | sort 2^12 | -1.2 | | struct_array_from_vec 128 | -1.4 | | array_from_vec 256 | -1.4 | | array_from_vec 128 | -1.5 | | filter context u8 high selectivity | -1.6 | | limit 512, 512 | -1.7 | | equal_string_nulls_512 | -1.8 | | take i32 nulls 1024 | -1.8 | | struct_array_from_vec 512 | -1.9 | | filter context f32 high selectivity | -2.0 | | cast timestamp_ms to timestamp_ns 512 | -2.2 | | take i32 nulls 512 | -2.3 | | buffer_bit_ops or | -2.4 | | array_from_vec 512 | -2.6 | | cast float64 to uint64 512 | -2.7 | | take str 512 | -2.8 | | min nulls string 512 | -3.1 | | cast int32 to int32 512 | -3.3 | | array_slice 128 | -3.3 | | filter context u8 w NULLs very low selectivity | -3.3 | | buffer_bit_ops and | -3.4 | | struct_array_from_vec 256 | -4.2 | | cast int32 to uint32 512 | -4.5 | | multiply 512 | -5.2 | | equal_string_512 | -5.5 | | take str null values null indices 1024 | -6.8 | | sum nulls 512 | -13.3 | | add_nulls_512 | -17.6 | | like_utf8 scalar contains | -17.8 | | nlike_utf8 scalar contains | -17.9 | | nlike_utf8 scalar complex | -24.6 | | like_utf8 scalar complex | -25.2 | | cast time64ns to time32s 512 | -42.7 | | cast date64 to date32 512 | -49.1 | | cast date32 to date64 512 | -50.7 | | nlike_utf8 scalar starts with | -51.1 | | nlike_utf8 scalar ends with | -55.1 | | like_utf8 scalar ends with | -55.5 | | like_utf8 scalar starts with | -56.3 | | nlike_utf8 scalar equals | -67.8 | | like_utf8 scalar equals | -74.2 | | eq Float32 | -75.7 | | gt_eq Float32 | -76.1 | | lt_eq Float32 | -76.5 | | not | -77.1 | | and | -78.6 | | or | -78.7 | | lt_eq scalar Float32 | -79.4 | | eq scalar Float32 | -82.1 | | neq Float32 | -82.1 | | lt scalar Float32 | -82.1 | | lt Float32 | -82.3 | | gt Float32 | -82.4 | | gt_eq scalar Float32 | -82.4 | | neq scalar Float32 | -82.6 | | gt scalar Float32 | -84.7 | Closes #8842 from jorgecarleitao/boolean Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>