Public / onnxruntime / 1c84621020f

Commits

Chen Fu authored and GitHub committed 1c84621020f16 Nov 2021

Adding ARM64 depthwise convolution kernel for symmetric quantization (#9655)

Adding ARM64 depthwise convolution kernel for symmetric quantization

Motivation and Context
Two improvements against current kernel code :

1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication.
2. Unrolled loop with manual software pipelining

Co-authored-by: Chen Fu <fuchen@microsoft.com>