Commits


Antoine Pitrou authored and GitHub committed 04249b9137f
GH-45216: [C++][Compute] Refactor Rank implementation (#45217) ### Rationale for this change The Rank implementation currently mixes ties/duplicates detection and rank computation in a single function `CreateRankings`. This makes it poorly reusable for other Rank-like functions such as the Percentile Rank function proposed in GH-45190. ### What changes are included in this PR? Split duplicates detection into a dedicated function that sets a marker bit in the sort-indices array (it is private to the Rank implementation, so it is safe to mutate it). The rank computation itself (`CreateRankings`) becomes simpler and, moreover, it does not need to read the input values: it becomes therefore type-agnostic. This yields a code size reduction (around 45kB saved on the author's machine): * before: ```console $ size /build/build-release/relwithdebinfo/libarrow.so text data bss dec hex filename 26072218 353832 2567985 28994035 1ba69f3 /build/build-release/relwithdebinfo/libarrow.so ``` * after: ```console $ size /build/build-release/relwithdebinfo/libarrow.so text data bss dec hex filename 26028198 353832 2567985 28950015 1b9bdff /build/build-release/relwithdebinfo/libarrow.so ``` Rank benchmark results are mostly neutral, though there are slight improvements on some benchmarks, and slight regressions especially on all-nulls input. ### Are these changes tested? Yes, by existing tests. ### Are there any user-facing changes? No. * GitHub Issue: #45216 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>