Commits


KeDengMS authored and GitHub committed 068b568472b
Add support for int8 x uint8 for MatMulInteger, and int16 x int16 custom op (#1391) Description: The change adds necessary quantization support on CPU with mixed int8/uint8, as well as int16 for matrix multiply operations that outputs int32 Motivation and Context Integer operations are critical for quantized model's performance Current MatMulInteger implementation in CPU only supports uint8 x uint8, while the spec supports int8 x uint8. Having a default CPU implementation that fully support the spec would help accuracy verification. Besides, some model may need to quantize to int16, but MatMulInteger op does not support that yet. A custom op of MatMulInteger16 is added to satisfy such models.