Commits


Tianlei Wu authored and GitHub committed 012b34dc4e6
Add --use_multi_head_attention in transformers fusion (#14198) Add an option --use_multi_head_attention to fuse model with MultiHeadAttention operator instead of Attention operator for testing purpose. Note that MultiHeadAttention can be used in self-attention and cross-attention, while Attention operator is used for self-attention only. In Attention operator, there is packed Q/K/V weights for input projection, but that MatMul of input projection is excluded from MultiHeadAttention.