Commits


Yufeng Li authored and GitHub committed 373f912e513
add quantization support for whisper (#15589) ### Description <!-- Describe your changes. --> Add dynamic quantization support for whisper model. There are 3 options to try out: - quantize_embedding_layer: enable to quantize embedding layer of decoder model or not - quantize_per_channel: enable to quantize per channel for Gemm or MatMul - quantize_reduce_range: use 7bit to quantize MatMul or Gemm. Use when hitting accuracy issue on x64 cpus without VNNI.