Commits


Yibo Cai authored and Antoine Pitrou committed 823fe6066f1
ARROW-9873: [C++][Compute] Optimize mode kernel for integers in small value range For int16/32/64 arrays with reasonable length, scan the array to find min/max values first. If (max-min) is within some threshold, instead of general hashmap, using a value indexed array can improve performance significantly. To be compatible with chunked array, value count array is transferred to hashmap before merging with others. This is an overhead for short array. Finding min/max may also introduce performance penalty in some cases. Please note it's hard to benefit all use cases. By applying this patch: - about 2x performance uplift for integers in small value range - no obvious performance drop for normal cases - non-trivial performance drop in some cases * 40% drop for short int8 array (8k length) * 10% drop for sparse array (few distinct values, big value gap) Closes #8091 from cyb70289/mode-count Lead-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>