Commits


Tianlei Wu authored and GitHub committed a2ffc3740b4
[Cuda] Demo multiple cuda graphs and user compute stream (#19883) Update stable diffusion demo to add options `--max-cuda-graphs` and `--user-compute-stream`. * Add python class GpuBindingManager to manage IO Binding based on input shape and max number of cuda graphs setting. The benefit is that one inference session could enable or disable cuda graph in different runs. * When `--user-compute-stream`, the demo will use custom compute stream.