Commits


mindest authored and GitHub committed 4ffc1ff3b4a
DMMHA: add unit tests; fix CPU, CUDA kernel (#22567) ### Description Fixes: (1) cpu kernel: applying scale before bias and mask like other MHA ops (2) cpu kernel: correct offset during appending past to present. (3) cuda kernel: apply mask if provided; fix output_qk offset. Add DMMHA unit tests