Commits


Yibo Cai authored and Antoine Pitrou committed 22bebf8278c
ARROW-11568: [C++][Compute] Rewrite mode kernel Arrow mode kernel performance is bad compared with scipy.stats.mode (based on numpy.unique). Arrow mode kernel stores value:count pair in a map, while numpy.unique sorts the input array then count the adjacent same values. Per my test, the map approach only wins when there are many duplicated values (length / value_range > 100), looks not very useful in practice. This patch rewrites mode kernel to use the sort and count approach for floating points and integers with wide value range. 2x performance improvement is observed. Closes #10009 from cyb70289/11568-mode-optimize Lead-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>