Public / onnxruntime / 373f912e513

Commits

Yufeng Li authored and GitHub committed 373f912e51321 Apr 2023

add quantization support for whisper (#15589)

### Description
<!-- Describe your changes. -->
Add dynamic quantization support for whisper model.
There are 3 options to try out:
- quantize_embedding_layer: enable to quantize embedding layer of
decoder model or not
- quantize_per_channel: enable to quantize per channel for Gemm or
MatMul
- quantize_reduce_range: use 7bit to quantize MatMul or Gemm. Use when
hitting accuracy issue on x64 cpus without VNNI.