Commits


Frank Dong authored and GitHub committed dabd395fdfd
llama 70b model fusion and shardding (#18175) ### Description Support llama-70b model fusion and shardding ### Motivation and Context This change enables shard and export llama-70b model into Onnx as this model is too large for single GPU. This change also fuses llama-70b model with repeat_kv pattern different with llama-7b and llama-13b.