Commits


zhijiang authored and GitHub committed 8fadc6c913b
Zhijxu/cleanup cached tensors when oom (#19306) in pytorch, when oom happens at bp, user could decrease the batch size and rerun it without restarting the process. while in ORT, the intermediate tensors are kept even OOM, so decrease batch size still fail. this is torch run, we can see after oom failure, torch will release tensor before next step  this is from ort, we can see ort not release its tensors after OOM failure.  ort with the PR, we can see memory is released, **the 4GB memory is not own by ort, and will be released by torch at the end**. 