Commits


Yuqi Gu authored and Matthew Topol committed da5b0360aac
ARROW-15172: [Go] Add Arm64 Neon implementation for Arrow-math Add 'sum_int64_neon', 'sum_uint64_neon', 'sum_floatt64_neon' with Arm64 Golang assembly. C2GOASM doesn't work correctly for Arm64. `uint64_neon_arm64.s` `int64_neon_arm64.s` `float64_neon_arm64.s` were partly generated by asm2plan9s. - Add ability to enable Arm64 extension via environment: `ARM_ENABLE_EXT=NEON` - Benchmark Disable Arm64 Neon: ``` goos: linux goarch: arm64 pkg: github.com/apache/arrow/go/v7/arrow/math BenchmarkFloat64Funcs_Sum_256-46 3247237 364.9 ns/op 5611.91 MB/s BenchmarkFloat64Funcs_Sum_1024-46 1000000 1424 ns/op 5750.99 MB/s BenchmarkFloat64Funcs_Sum_8192-46 100620 11360 ns/op 5768.89 MB/s BenchmarkFloat64Funcs_Sum_1000000-46 866 1382561 ns/op 5786.36 MB/s BenchmarkInt64Funcs_Sum_256-46 5655325 251.3 ns/op 8150.04 MB/s BenchmarkInt64Funcs_Sum_1024-46 1254841 954.0 ns/op 8586.80 MB/s BenchmarkInt64Funcs_Sum_8192-46 148898 7515 ns/op 8720.33 MB/s BenchmarkInt64Funcs_Sum_1000000-46 1299 921258 ns/op 8683.78 MB/s BenchmarkUint64Funcs_Sum_256-46 4753304 246.4 ns/op 8313.19 MB/s BenchmarkUint64Funcs_Sum_1024-46 1253706 954.7 ns/op 8580.65 MB/s BenchmarkUint64Funcs_Sum_8192-46 149168 7561 ns/op 8667.80 MB/s BenchmarkUint64Funcs_Sum_1000000-46 1304 918844 ns/op 8706.60 MB/s ``` Enable Arm64 Neon: ``` goos: linux goarch: arm64 pkg: github.com/apache/arrow/go/v7/arrow/math BenchmarkFloat64Funcs_Sum_256-46 11145474 102.4 ns/op 19996.84 MB/s BenchmarkFloat64Funcs_Sum_1024-46 3156472 375.5 ns/op 21816.63 MB/s BenchmarkFloat64Funcs_Sum_8192-46 351138 2886 ns/op 22707.53 MB/s BenchmarkFloat64Funcs_Sum_1000000-46 3456 346282 ns/op 23102.55 MB/s BenchmarkInt64Funcs_Sum_256-46 11427655 101.4 ns/op 20196.81 MB/s BenchmarkInt64Funcs_Sum_1024-46 3231861 373.0 ns/op 21963.24 MB/s BenchmarkInt64Funcs_Sum_8192-46 382744 2880 ns/op 22753.01 MB/s BenchmarkInt64Funcs_Sum_1000000-46 3486 344900 ns/op 23195.16 MB/s BenchmarkUint64Funcs_Sum_256-46 11319964 100.9 ns/op 20306.60 MB/s BenchmarkUint64Funcs_Sum_1024-46 3180728 373.8 ns/op 21914.31 MB/s BenchmarkUint64Funcs_Sum_8192-46 368254 2881 ns/op 22748.54 MB/s BenchmarkUint64Funcs_Sum_1000000-46 3481 345534 ns/op 23152.55 MB/s ``` Get 2.5X performance uplift for Int64, 2.6X for Uint64 and 3.8X for Float64. Closes #12009 from guyuqi/ARROW-15172 Authored-by: Yuqi Gu <yuqi.gu@arm.com> Signed-off-by: Matthew Topol <mtopol@factset.com>