Commits


crystrix authored and Benjamin Kietzman committed 0e227c9bfdc
ARROW-12942: [C++][Compute] Fix incorrect result of Arrow compute hash_min_max with a chunked array If there are new groups in the subsequent chunks of a chunked array, the result of Arrow compute hash_min_max is incorrect. For example, a table with two chunks, the second chunk has a new group key ``` First chunk: {"argument": 1, "key": 0}, Second chunk: {"argument": 0, "key": 1} ``` the result of hash_min_max by "key" with such data is ``` [{"min": null, "max": null}, 0], [{"min": 0, "max": 0}, 1] ``` But it should be ``` [{"min": 1, "max": 1}, 0], [{"min": 0, "max": 0}, 1] ``` The root cause is that `has_values_` and `has_nulls_` are `BufferBuilder` which has no `_size` and `capacity_` property. So `MakeResizeImpl` function init a `TypedBufferBuilder` with the `BufferBuilder` with `_size` and `capacity_` of 0. After the first chunk is processed, in the consumption of the second chunk, `MakeResizeImpl` is called to reserve enough space for the next chunk. Then as the `_size` and `capacity_` are zero, the original `BufferBuilder` is overwritten by `Reserve`, and outputs an incorrect result. This MR separates `has_values_` and `has_nulls_` with a `TypedBufferBuilder<bool>` which can keep the `_size` and `capacity_` property. Then in the consumption of the second chunk, the space of `has_values_` and `has_nulls_` is reserved after the data of the first chunk. Closes #10443 from Crystrix/arrow-12942 Authored-by: crystrix <chenxi.li@live.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>