Commits


Jorge C. Leitao authored and Andrew Lamb committed 7f3794cb7cb
ARROW-10827: [Rust] Move concat from builders to a compute kernel and make it faster (2-6x) This PR: * extends the types that `concat` support for all types that `MutableArrayData` supports (i.e. it now supports nested Lists, all primitives, boolean, string and large string, etc.) * makes `concat` 6x faster for primitive types and 2x faster for string types (and likely also for the other types) * changes `concat`'s signature to `&[&Array]` instead of `&Vec<Arc<Array>>`, to avoid an `Arc::clone`. Since `XBuilder::append_data` was specifically built for this kernel but is not used, and `MutableArrayData` offers a more generic API for it, this PR removes that code. The overall principle for this removal is that `Builder` is the API to build an arrow array from elements or slices of rust native types, while the `MutableArrayData` (for a lack of a better name) is suited to build an arrow array from an existing set of arrow arrays. In the case of `concat`, this corresponds to mem-copies of the individual arrays (taking into account nulls and all that stuff) in sequence. Based on this principle, `Builder` does not need to know how to build an array from existing arrays (the `append_data`). Benchmarks: | benchmark | variation (%) | |-------------- | -------------- | | concat str 1024 | -45.3 | | concat str nulls 1024 | -61.1 | | concat i32 1024 | -83.5 | | concat i32 nulls 1024 | -86.1 | ``` git checkout 66468daf0b3ac3ef08b7c99c690e7b845f23ad2b cargo bench --bench concatenate_kernel git checkout concat cargo bench --bench concatenate_kernel ``` ``` Previous HEAD position was 66468daf0 Added concatenate bench Switched to branch 'concat' Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 58.72s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/concatenate_kernel-94b8f5621cd4f767 Gnuplot not found, using plotters backend concat i32 1024 time: [4.2852 us 4.2912 us 4.2973 us] change: [-83.690% -83.469% -83.188%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe concat i32 nulls 1024 time: [4.8617 us 4.8820 us 4.9080 us] change: [-86.335% -86.101% -85.813%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe concat str 1024 time: [19.472 us 19.527 us 19.593 us] change: [-46.212% -45.314% -44.341%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe concat str nulls 1024 time: [39.447 us 39.525 us 39.613 us] change: [-61.858% -61.091% -60.311%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe ``` Closes #8853 from jorgecarleitao/concat Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>