Commits


Tianlei Wu authored and GitHub committed 9be133231fe
Fix cuda graph capture (#15005) Fix two issues related to cuda graph capture: https://github.com/microsoft/onnxruntime/issues/14942 and https://github.com/microsoft/onnxruntime/issues/15002 Issue 1: Previously, graph capture starts at the second run. However, memory pattern optimization will allocate memory from the second run, and cudamalloc is not allowed during graph capture. In this PR, the graph capture will start graph capture after 2 runs to avoid the issue. Issue 2: https://github.com/microsoft/onnxruntime/pull/13495 introduced multiple stream support. But stream cleanup will call cudaStreamSyncronize which is not allowed in cuda graph capture. In this PR, we move stream cleanup after cuda graph capture. Update the squeeze net test model with dynamic axis so that we can test with larger batch size. Add a test that could reproduce the bug (when changing min runs from 2 back to 1).