Public / onnxruntime / dabd395fdfd

Commits

Frank Dong authored and GitHub committed dabd395fdfd02 Nov 2023

llama 70b model fusion and shardding (#18175)

### Description
Support llama-70b model fusion and shardding



### Motivation and Context
This change enables shard and export llama-70b model into Onnx as this
model is too large for single GPU.
This change also fuses llama-70b model with repeat_kv pattern different
with llama-7b and llama-13b.