Commits


Yibo Cai authored and Krisztián Szűcs committed b96cd3604d1
ARROW-8496: [C++] Refine ByteStreamSplitDecodeScalar I simplified DecoderScalar code and see huge performance boost from clang generated code. Per my test on Intel E5-2650 with clang-9, Decode_Float_Scalar test jumps from 600M/s to 20G/s, even better than SSE version(17G/s). Similar behaviour observed on Arm64. Some digging shows clang auto vectorized the simplified decoder code, but gcc cannot: https://godbolt.org/z/kq9FAs Interestingly, gcc is able to auto vectorize EncoderFloatScalar code, but clang cannot: https://godbolt.org/z/E3LnZD NOTE: This scalar code is not tested in default x86_64 build, which goes the SSE version. Arm64 build goes this scalar code path. Closes #6962 from cyb70289/bytesplit Authored-by: Yibo Cai <yibo.cai@arm.com> Signed-off-by: Antoine Pitrou <antoine@python.org>