Public / onnxruntime / c8def0cc519

Commits

kunal-vaishnavi authored and GitHub committed c8def0cc51909 Nov 2023

Add LLaMA GQA ragged batching (#18337)

This PR updates replacing MHA with GQA and updates the LLaMA scripts for
the modified GQA op. It is related to the changes in [this
PR](https://github.com/microsoft/onnxruntime/pull/18283).

### Motivation and Context
This PR allows us to run LLaMA with the GQA op end-to-end using ragged
batching (i.e. batched inputs of different lengths).