Public / onnxruntime / 0ff915eba83

Commits

Patrice Vignola authored and GitHub committed 0ff915eba8317 May 2023

[DML EP] Add frequent upload heap flushing (#15960)

This reduces peak nonlocal memory consumption when uploading large
weights for big models (e.g. LLMs), while at the same time trying to
keep the GPU as busy as possible. This change could be more
sophisticated, but at this stage it is the most minimal and least risky
change required to support LLMs.