Commits


Adrian Lizarraga authored and GitHub committed b84712151c0
QNN EP: Fuse DQ -> Q sequences into a QNN Convert op (#19511) ### Description Fuses DQ -> Q sequences into a QNN Convert operator if: - Converting from one qtype to another. Ex: Dequantize(uint8 to float) -> Quantize(float to uint16) - The DQ and Q operators are not part of another node unit (i.e., standalone) - The Q operator is the only consumer for the DQ operator. ### Motivation and Context Allows faster execution of QDQ models with mixed activation types by leveraging the QNN Convert operator, which converts between quantization types. For certain models, this results in inference latency speed-ups of up to 2x (depends on the number of DQ -> Q sequences). #### Example for Add node unit with 16-bit I/O: Original: ``` u8 ----> DQ ---> Q ---u16--> Add ---u16--> ^ | u16 --------------------------+ ``` After fusing DQ -> Q: ``` u8 ----> Convert ---u16--> Add ---u16--> ^ | u16 ------------------------+ ```