Commits


Jorge C. Leitao authored and Andrew Lamb committed e2d6c057684
ARROW-11357: [Rust]: Fix out-of-bounds reads in `take` and other undefined behavior This PR fixes two major issues in our `take` kernel for primitive arrays. Background * When using `values()` from an array, it is important to remember that only certain values are not arbitrary (those whose null bit is set / no buffer). * When reading values from an array using `array.value(i)`, it is important to remember that it currently performs no bound checks. * `take` offers an option to deactivate bound checks, by turning an error in a panic. that option defaults to not check (i.e. `panic`) however, `take` kernel respects none of the points above: * it reads and uses arbitrary indices * it accesses out of bound values * it does not panic when `check_bounds = false` (it instead reads out of bounds) Specifically, it is currently doing something like the following (ignoring some details): ```rust let indices = indices.values(); for index in indices { let index = index.to_usize(); let taken_value = values.value(index) } ``` I.e. * there is no check that `index` is a valid slot * there is no check that `index < values.len()`. Independently, each of them is unsound. Combined, they allow for spectacular unbounded memory reads: reading from a pointer offsetted by an arbitrary value. 🤯 This PR fixes this behavior. This PR also improves performance by 20-40% (thanks to @Dandandan and @tyrelr great suggestions). ``` Your branch is ahead of 'origin/take_fix' by 1 commit. (use "git push" to publish your local commits) Compiling arrow v4.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 25s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/take_kernels-683d8fd1eeba5497 Gnuplot not found, using plotters backend take i32 512 time: [1.0477 us 1.0519 us 1.0564 us] change: [-25.007% -23.339% -21.840%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe take i32 1024 time: [1.4949 us 1.4996 us 1.5049 us] change: [-34.066% -31.907% -29.853%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe take i32 nulls 512 time: [976.65 ns 983.09 ns 991.99 ns] change: [-45.076% -39.795% -35.157%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe take i32 nulls 1024 time: [1.3249 us 1.3278 us 1.3309 us] change: [-32.887% -31.666% -30.524%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe ``` Closes #9301 from jorgecarleitao/take_fix Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>