Commits


crystrix authored and Antoine Pitrou committed 63060c8e8ce
ARROW-14898: [C++][Compute] Fix crash of out-of-bounds memory accessing in key_hash if a key is smaller than int64 I encountered a crash when executing GroupBy on specific data. The code and data to reproduce the crash can be found in the related JIRA ticket https://issues.apache.org/jira/browse/ARROW-14898 I think the root cause is the tail process in `Hashing::hash_varlen` of `key_hash.cc`. The steps of related code are as follows: 1. `Hashing::hash_varlen` calls `helper_tail` for the tail part of the key 2. `helper_tail` calls `util::SafeLoadAs` to load 8 bytes of data from the key 3. `util::SafeLoadAs` calls `std::memcpy` to copy 8 bytes of data from the key If the key is less than 8 bytes, the `std::memcpy` still copies 8 bytes which may access illegal memory. This PR adds a `length` parameter to those functions to copy just the size of the key for the tail. I'm not sure how to add a UT to test it, as it only happens on my specific data. The AVX2 code also has this crash, the fix for AVX2 is not in this PR. Closes #11789 from Crystrix/arrow-14898 Authored-by: crystrix <chenxi.li@live.com> Signed-off-by: Antoine Pitrou <antoine@python.org>