Commits


cao lei authored and GitHub committed c2dad6893b9
use cudaStreamNonBlocking flag (#15258) ### Description This PR uses cudaStreamNonBlocking flag when creating cuda stream, meaning the created stream will run concurrently with default stream, no implicit synchronization with default stream. ### Motivation and Context This PR is required for the perf concern