Commits

Marius Hillenbrand authored 22aa81f3e58
s390x: fix cscal and zscal implementations The implementation of complex scalar * vector multiplication for Z14 makes some LAPACK tests fail because the numerical differences to the reference implementation exceed the threshold (as can be seen by running make lapack-test and replacing kernel/zarch/cscal.c with a generic implementation for comparison). The complex multiplication uses terms of the form a * b + c * d for both real and imaginary parts. The assembly code (and compiler-emitted code as well) uses fused multiply add operations for the second product and sum. The results can be "surprising", for example when both terms in the imaginary part nearly cancel each other out. In that case, the second product contributes more digits to the sum than the first product that has been rounded before. One option is to use separate multiplications (which then round the same way) and a distinct add. Change the code to pursue that path, by (1) requesting the compiler not to contract the operations into FMAs and (2) replacing the assembly kernel with corresponding vectorized C code (where change 1 also applies). Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>