Public / onnxruntime / 24818cfd73a

Commits

sumitsays authored and GitHub committed 24818cfd73a25 Oct 2022
[DML EP] Attention Kernel (#13371)

### Description
DML EP kernel for com.microsoft.attention operator. It has been
implemented via DML_Graph. References for this implementation:

1. [Hugging Face Attention for
BERT](https://github.com/huggingface/transformers/blob/310340d0d01929715b30863ee6f633974d75da16/src/transformers/models/bert/modeling_bert.py#L245-L284)
2. Chapter 3 of book Orielly: Natural Language Processing with
Transformers, Revised Edition

This PR also

- includes a very tiny fix for QLinearSigmoid kernel, which is storing
the temporary object into a named variable.
- enables 4 L2 transformers LayerNorm, Gelu, MatMulScale, Attention.



### Motivation and Context
- Why is this change required? What problem does it solve? 
One of the main operators used in Transformer-based model. It
contributes to the overall perf of DML EP for Transformer models.
- If it fixes an open issue, please link to the issue here. N/A

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>