Commits


Sherlock authored and GitHub committed 9174cbe3d5e
Optimize CUDA Kernel for 3D and 4D Transpose (#8928) * Optimize Transpose120 and Transpose102 * Generalize Transpose0123 for more input shapes * Add Transpose3D test cases * update rocm kernel