Commits


Yufeng Li authored and GitHub committed 902c5f53aeb
add cutlass fmha support in PackedAttention (#15838) ### Description <!-- Describe your changes. --> Support cutlass fMHA in PackedAttention. Though we have fMHA trt kernel, it doesn't support relative bias position. Cutlass fmha has support for RBP and also support lower end GPUs(5.3, 6.x). ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->