Commits


Chen Fu authored and GitHub committed d761a7ceb30
Pre-processing of Quantization (#12729) Shape Inference and Model Optimization before Quantization Model quantization with QDQ format, i.e. inserting QuantizeLinear/DeQuantizeLinear on the tensor, requires tensor shape information to perform its best. Currently, shape inferencing works best with optimized model. As a result, it is highly recommended to run quantization on optimized model with shape information. This change adds code for model optimization and shape inferencing of the following three steps: 1. Symbolic shape inference. 2. Model optimization 3. ONNX shape inference At the same time we should recommend model optimization should be turned off during quantization. As the optimization might change the computation graph, making it harder for the QDQ debugger to locate matching tensors between original and the quantized models.