Public / onnxruntime / d761a7ceb30

Commits

Chen Fu authored and GitHub committed d761a7ceb3030 Aug 2022

Pre-processing of Quantization (#12729)

Shape Inference and Model Optimization before Quantization

Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on
the tensor, requires tensor shape information to perform its best. Currently, shape inferencing
works best with optimized model. As a result, it is highly recommended to run quantization
on optimized model with shape information.

This change adds code for model optimization and shape inferencing of the following three steps:

1. Symbolic shape inference.
2. Model optimization
3. ONNX shape inference

At the same time we should recommend model optimization should be turned off during quantization.
As the optimization might change the computation graph, making it harder for the QDQ debugger
to locate matching tensors between original and the quantized models.