Commits


kunal-vaishnavi authored and GitHub committed 4ac98d6d656
Update replacing MultiHeadAttention with GroupQueryAttention (#19882) ### Description This PR updates the replacement of MultiHeadAttention (MHA) with GroupQueryAttention (GQA). It is related to the changes in [this PR](https://github.com/microsoft/onnxruntime/pull/18906). ### Motivation and Context The updated replacement of MHA with GQA includes the following fusion changes. - Apply sliding window within GQA - Fuse the rotary embeddings within GQA - Fuse the 3 MatMuls into 1 packed MatMul if possible - Fuse the 3 Adds into 1 packed Add if possible