Public / onnxruntime / 414b012f42e

Commits

Tianlei Wu authored and GitHub committed 414b012f42e21 Jan 2023

Add memory efficient attention from CUTLASS (#14343)

### Description
Add memory efficient attention from CUTLASS.

TODO (in next pull request): 
(1) Need performance tests on different GPUs, then add a sequence length
threshold (only activate it for long sequence length).
(2) Merge changes from https://github.com/NVIDIA/cutlass/pull/773 when
it is in cutlass master.