Commits


Tang, Cheng authored and GitHub committed 8f34c8c8eda
Introduce collective ops to ort inference build (#14399) ### Description Introduce collective ops into onnxruntime inference build, including 1) AllReduce and AllGather schema in contrib op, controlled by USE_MPI flag 2) AllReduce and AllGather kernel in cuda EP, controlled by ORT_USE_NCCL flag ### Motivation and Context Enable the collective ops in onnxruntime inference build so we have the ability to run distributed inference with multiple GPUs. The original ncclAllReduce ops in training build require quite complex configurations, which is not suitable for inference case, and it already broken. so we introduce a new implementation. --------- Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>