Public / onnxruntime / a997bb46b6a

Skip to sidebar navigation
Skip to content

Commits

cloudhan authored and GitHub committed a997bb46b6a03 Mar 2023

Refactor rocm attention (#14688)

Extract QKV projection and attention computation into pipelines (composed from gemms and kernel launch). 

This will allow us to introduce ck flash attention in next PR