Commits


Joseph Groenenboom authored and GitHub committed a433f22f17e
Softmax interface update (#12469) * Template datatype for SoftmaxWithRawMaskSmallKernel in ROCm EP * Remove valid_items usage from SoftmaxWithRawMaskSmallKernel for ROCm EP The kernel already masks off invalid items and this gives a much faster implementation in hipCUB. * Update accumulator type in ROCm EP for SoftmaxWithRawMaskSmallKernel Hard code accumulator to fp32 for hipCUB in indicated kernel. * Reset casting to old behavior * Document steps to optimize SoftMax kernel on ROCm EP Usage of the hipCUB valid_items interface on reduction operations has a significant performance impact. Masking all thread data to avoid need to use the valid_items interface to hipCUB.