Commits


octalene authored and GitHub committed af4db7731b1
ARROW-16807: [C++][R] count distinct incorrectly merges state (#13583) This addresses a bug where the `count_distinct` function simply added counts when merging state. The correct logic would be to return the number of distinct elements after both states have been merged. State for count_distinct is backed by a MemoTable, which is then backed by a HashTable. To properly merge state, this PR adds 2 functions to each MemoTable: `MaybeInsert` and `MergeTable`. The MaybeInsert function handles simplified logic for inserting an element into the MemoTable. The MergeTable function handles iteration over elements in the MemoTable _to be merged_. This PR also adds an R test and a C++ test. The R test mirrors what was provided in ARROW-16807. The C++ test, `AllChunkedArrayTypesWithNulls`, mirrors another C++ test, `AllArrayTypesWithNulls`, but uses chunked arrays for test data. Lead-authored-by: Aldrin Montana <octalene.dev@pm.me> Co-authored-by: Aldrin M <octalene.dev@pm.me> Co-authored-by: Wes McKinney <wesm@apache.org> Signed-off-by: Wes McKinney <wesm@apache.org>