Commits


Sutou Kouhei authored and Antoine Pitrou committed 5e609623f2a
ARROW-8199: [C++] Add support for multi-column sort indices on Table Summary: * Deprecate SortToIndices() * Add SortIndices() because we use "sort_indices" as kernel name in #7240 * Add support for sort indices in descending order on Array * Rename existing "sort_indices" kernel to "array_sort_indices" and introduce "sort_indices" meta function to support ChunkedArray, RecordBatch and Table * Require benchmark 1.5.2 or later Benchmark: Summary: * No performance regression in existing sort on array * Same performance as sort on array when the number of sort keys is 1 * 1.5x-100x slower than sort on array when the number of sort keys is 2 Before: ---------------------------------------------------------------------------------- Benchmark Time CPU ---------------------------------------------------------------------------------- SortToIndicesInt64Count/32768/10000/min_time:1.000 16057 ns 16054 ns SortToIndicesInt64Count/32768/100/min_time:1.000 16592 ns 16589 ns SortToIndicesInt64Count/32768/10/min_time:1.000 15979 ns 15975 ns SortToIndicesInt64Count/32768/2/min_time:1.000 28379 ns 28372 ns SortToIndicesInt64Count/32768/1/min_time:1.000 5767 ns 5766 ns SortToIndicesInt64Count/32768/0/min_time:1.000 12560 ns 12557 ns SortToIndicesInt64Count/1048576/100/min_time:1.000 729683 ns 729497 ns SortToIndicesInt64Count/8388608/100/min_time:1.000 6696415 ns 6693711 ns SortToIndicesInt64Compare/32768/10000/min_time:1.000 190488 ns 190442 ns SortToIndicesInt64Compare/32768/100/min_time:1.000 190879 ns 190838 ns SortToIndicesInt64Compare/32768/10/min_time:1.000 179156 ns 179115 ns SortToIndicesInt64Compare/32768/2/min_time:1.000 109098 ns 109074 ns SortToIndicesInt64Compare/32768/1/min_time:1.000 5682 ns 5681 ns SortToIndicesInt64Compare/32768/0/min_time:1.000 185022 ns 184984 ns SortToIndicesInt64Compare/1048576/100/min_time:1.000 9275270 ns 9273414 ns SortToIndicesInt64Compare/8388608/100/min_time:1.000 93126006 ns 93095276 ns After: ------------------------------------------------------------------------------------------ Benchmark Time CPU ------------------------------------------------------------------------------------------ ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000 15722 ns 15721 ns ArraySortIndicesInt64Narrow/32768/100/min_time:1.000 16007 ns 16006 ns ArraySortIndicesInt64Narrow/32768/10/min_time:1.000 15178 ns 15177 ns ArraySortIndicesInt64Narrow/32768/2/min_time:1.000 29886 ns 29885 ns ArraySortIndicesInt64Narrow/32768/1/min_time:1.000 5968 ns 5968 ns ArraySortIndicesInt64Narrow/32768/0/min_time:1.000 12681 ns 12681 ns ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000 813376 ns 813331 ns ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000 6543874 ns 6543621 ns ArraySortIndicesInt64Wide/32768/10000/min_time:1.000 189957 ns 189956 ns ArraySortIndicesInt64Wide/32768/100/min_time:1.000 190269 ns 190267 ns ArraySortIndicesInt64Wide/32768/10/min_time:1.000 175425 ns 175422 ns ArraySortIndicesInt64Wide/32768/2/min_time:1.000 107976 ns 107975 ns ArraySortIndicesInt64Wide/32768/1/min_time:1.000 5941 ns 5941 ns ArraySortIndicesInt64Wide/32768/0/min_time:1.000 184858 ns 184857 ns ArraySortIndicesInt64Wide/1048576/100/min_time:1.000 9355194 ns 9354942 ns ArraySortIndicesInt64Wide/8388608/100/min_time:1.000 94101796 ns 94099551 ns TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns 1386176541 ns TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000 1350031141 ns 1349982623 ns TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000 1401741018 ns 1401632081 ns TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000 1373174328 ns 1373145968 ns TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000 1035377805 ns 1035362806 ns TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000 1035218085 ns 1035201824 ns TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000 6859319 ns 6859251 ns TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000 6847314 ns 6847048 ns TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000 626909516 ns 626904696 ns TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000 591632035 ns 591602144 ns TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000 613197155 ns 613182727 ns TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000 595568690 ns 595554829 ns TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000 424472984 ns 424453915 ns TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000 397339109 ns 397335604 ns TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000 7027310 ns 7027241 ns TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000 6891364 ns 6891272 ns TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000 516832749 ns 516823293 ns TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000 495273237 ns 495269975 ns TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000 515550360 ns 515531501 ns TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000 496670125 ns 496663084 ns TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000 340060863 ns 340053172 ns TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000 331281499 ns 331277004 ns TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000 6691062 ns 6690924 ns TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000 6384382 ns 6384323 ns TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000 622544467 ns 622531557 ns TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000 619193843 ns 619155597 ns TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000 615885889 ns 615884764 ns TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000 589917731 ns 589912005 ns TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000 600951149 ns 600947256 ns TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000 592304437 ns 592286953 ns TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000 98781239 ns 98777123 ns TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000 94136230 ns 94085863 ns TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000 232323308 ns 232319347 ns TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000 223708791 ns 223700834 ns TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000 221975360 ns 221968035 ns TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000 218214843 ns 218210306 ns TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000 224756430 ns 224745751 ns TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000 220809969 ns 220809044 ns TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000 96159427 ns 96156899 ns TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000 92307749 ns 92304526 ns TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000 166390841 ns 166382427 ns TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000 164576492 ns 164570581 ns TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000 165724919 ns 165718584 ns TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000 164048003 ns 164046222 ns TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000 170131788 ns 170124950 ns TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000 170874129 ns 170871245 ns TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000 92314674 ns 92312953 ns TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000 92023019 ns 92022643 ns Follow-up tasks: * Improve performance when the number of sort keys is 2 or greater * Idea1: Use counting sort for small range data like on array * Idea2: Use threading for radix sort * Add support for multi-column partition Nth indices on Table Closes #8612 from kou/cpp-compute-sort-indices-table Lead-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>