Commits


Adrian Lizarraga authored and GitHub committed b011f6fbf67
[TransposeOptimizer] Support Unsqueeze/Transpose of input consumed by per-axis DQ (#21821) ### Description Follow-up to: https://github.com/microsoft/onnxruntime/pull/21793 - Support looking past a per-axis DQ to do in-place Unsqueeze/Transpose of initializers - Support looking past a per-axis DQ to cancel a Transpose or Squeeze. ### Test models For all test models, the transpose optimizer pushes a Transpose through a Mul's input[0]. The Mul's input[1] is optionally unsqueezed and then transposed. ### I. Test in-place unsqueeze and transpose of per-axis quantized weight Original model has input[1] with shape (3,) <details><summary>click to expand model image</summary> <img src="https://github.com/user-attachments/assets/37b6f60c-77d2-4bd3-8ca2-58dc7c88a304" /> </details> Optimized model has input[1] with shape (1, 3, 1, 1). The initializer was unsqueezed and transposed in-place. <details><summary>click expand model image</summary> <img src="https://github.com/user-attachments/assets/adb72757-a164-400c-bfef-2a05f0e35825" /> </details> ### II. Test canceling existing Squeeze before per-axis DQ Original model has input[1] that is squeezed. <details><summary>click expand model image</summary> <img src="https://github.com/user-attachments/assets/f27e6742-b563-42a9-ad06-bb3178b0ceb8" /> </details> Optimized model unsqueezed and transposed input[1]. The original squeeze was removed due to the unsqueeze, leaving only the Transpose. <details><summary>click expand model image</summary> <img src="https://github.com/user-attachments/assets/e56261d4-eba6-4a9f-847b-dcd33548dd07" /> </details> ### III. Test canceling existing Transpose before per-axis DQ Original model has input[1] that is transposed. <details><summary>click expand model image</summary> <img src="https://github.com/user-attachments/assets/f157e04a-572a-479d-8e3b-cf57954df5c0" /> </details> Optimized model transposed input[1], thus canceling the existing transpose. <details><summary>click expand model image</summary> <img src="https://github.com/user-attachments/assets/63d742ce-3762-4ab2-bdb0-1b507886da9d" /> </details> ### IV. Test QDQ fix-up of Transpose/Unsqueeze for per-axis quantization Original model has input[1] that can be broadcasted. <details><summary>click expand model image</summary> <img src="https://github.com/user-attachments/assets/96c0092c-22ec-486d-882e-e2cb59ffe324" /> </details> The main transpose optimization loop inserts float32 Unsqueeze and Transpose after the DQ. The qdq fix-up pass inserts new per-axis Q/DQ ops after the inserted nodes. <details><summary>click expand model image</summary> <img src="https://github.com/user-attachments/assets/b6f89c11-974d-4b35-922f-11effdf06883" /> </details> ### Motivation and Context Enables the TransposeOptimizer to support more models with per-axis QDQ nodes. Per-axis quantization can improve model accuracy and is used by EPs like QNN. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>