Commits


Rossi Sun authored and GitHub committed 2bd2e35a97c
GH-43693: [C++][Acero] Support AVX2 swiss join decoding (#43832) ### Rationale for this change You can find the background in #43693. By looking at how `Visit_avx2/VisitNulls_avx2`'s non-simd counterparts (`Visit/VisitNulls`) are used, I found they are solely for decoding rows from the build side of the join. So I added AVX2 versions for those decoding methods and wired `Visit_avx2/VisitNulls_avx2`. ### What changes are included in this PR? 1. Split the decoding methods into smaller pieces to make each of them able to cooperate with the AVX2 version. 2. Concrete AVX2 specialized functions utilizing the `Visit*_avx2` functions to decode fixed-length/offsets/var-length/nulls of the row table. 3. Fix some bugs in the original `Visit*_avx2` functions. 4. Related benchmarks. ### Are these changes tested? No new tests needed. The benchmarking result is a bit complicated, I put them in comment https://github.com/apache/arrow/pull/43832#issuecomment-2328206421. ### Are there any user-facing changes? No changes other than positive performance improvement. Users can expect such improvement for hash joins related workload. Nevertheless the improvement degree highly depends on not only the workload, but also the CPU models. For Intel CPUs from Skylake to Icelake/Tigerlake, which suffer the performance degradation of AVX2 gather because of an vulnerability mitigation of Intel's (detailed in https://github.com/apache/arrow/pull/43832#issuecomment-2326646353), the improvement is less significant - single digit percent. Other models, e.g. AMD, and the most recent Intel, can achieve better improvement up to 30%. * GitHub Issue: #43693 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Antoine Pitrou <antoine@python.org>