Commits

Yibo Cai authored 178c7ddbfa0
ARROW-11758: [C++][Compute] Improve summation kernel percision Leverage pairwise sum to reduce round-off error from O(n) to O(logn). **NOTE:** This patch hurts sum kernel performance for floating points significantly. I don't worry too much as perf is on par with Numpy. For floating point, up to 75% drop is observed. This is because old code manually unrolls loops which greatly improves performance. But this is something should be avoided. Due to precision limitation, basic math rules doesn't apply to floating points. E.g., `(a+b)+c != a+(b+c)`. Test shows SSE4 and AVX2 summation kernels may give different results (both wrong), simply because they use different unroll steps. [1] I guess this is also the reason why compiler only unroll loops for integers, but not floating points. [1] https://issues.apache.org/jira/browse/ARROW-11758 Closes #9635 from cyb70289/sum-roundoff Authored-by: Yibo Cai <yibo.cai@arm.com> Signed-off-by: Yibo Cai <yibo.cai@arm.com>