Public / arrow / 178c7ddbfa0

Commits

Yibo Cai authored 178c7ddbfa012 Mar 2021
ARROW-11758: [C++][Compute] Improve summation kernel percision

Leverage pairwise sum to reduce round-off error from O(n) to O(logn).

**NOTE:** This patch hurts sum kernel performance for floating points
significantly. I don't worry too much as perf is on par with Numpy.

For floating point, up to 75% drop is observed. This is because old code
manually unrolls loops which greatly improves performance. But this is
something should be avoided. Due to precision limitation, basic math
rules doesn't apply to floating points. E.g., `(a+b)+c != a+(b+c)`. Test
shows SSE4 and AVX2 summation kernels may give different results (both
wrong), simply because they use different unroll steps. [1]
I guess this is also the reason why compiler only unroll loops for
integers, but not floating points.

[1] https://issues.apache.org/jira/browse/ARROW-11758

Closes #9635 from cyb70289/sum-roundoff

Authored-by: Yibo Cai <yibo.cai@arm.com>
Signed-off-by: Yibo Cai <yibo.cai@arm.com>