Commits


cao lei authored and GitHub committed a012d60777a
Make MemcpyToHost to a separate stream for performance gain (#14487) ### Description Make MemcpyToHost to a separate stream for performance gain in default DeviceBasedPartitioner ### Motivation and Context Our experiments show that make MemcpyToHost a separate stream will make it run parallel with other kernels, especially those compute-intensive ones. --------- Co-authored-by: Lei Cao <leca@microsoft.com>