Commits


Tianlei Wu authored and GitHub committed 414b012f42e
Add memory efficient attention from CUTLASS (#14343) ### Description Add memory efficient attention from CUTLASS. TODO (in next pull request): (1) Need performance tests on different GPUs, then add a sequence length threshold (only activate it for long sequence length). (2) Merge changes from https://github.com/NVIDIA/cutlass/pull/773 when it is in cutlass master.