Public / onnxruntime / b7ae293be05

Commits

JiCheng authored and GitHub committed b7ae293be0522 Oct 2023

Support large model export using multi-gpu (#17990)

### Description

This PR is to implemente a exporter which works for large language
models(LLM).
It works for models like Llama2-70b or gpt-175.

The main idea is to utilize multiple-GPU and dispatch differnet layers
to different GPU, in short, it symply implemented auto pipeline
parallelism.

For example : to export Llama2-70b, you need 8x V100-32GB or 4x A100-80G
or More GPU memories.

It would expect to export decoder-only models. For encoder-decoder
arch-like models, we didn't test it yet.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>