Commits


cloudhan authored and GitHub committed 51b67fa15c7
Make ROCm Attention biased+masked and biased+nomask scaling logic consistent (#14976) The biased+masked and biased+nomask have different scaling logic in current ROCm implementation Currently, biased + masked: (QK'+ bias) * scale + convert(mask) biased + nomask: QK' * scale + bias which is not correct. What we want is QK' * scale [+ bias] That is, bias should not be scaled. This effectively follows https://github.com/microsoft/onnxruntime/pull/14517/files?w=1#diff-e4768ce15a73499f584f9cd7d71adcb1ff2ed8d68ad7e496723a4775cbc35e33