Public / onnxruntime / c012e41f938

Commits

Ye Wang authored and GitHub committed c012e41f93806 Dec 2023

MoE with Expert Slicing  (#18565)

### Description
<!-- Describe your changes. -->

Registered Sharded MoE op under contrib_op/cuda/collective with expert
slicing. The broadcast process happens just before adding second bias(if
has) and permutation undoing. Tensor slicing is planned but not included
in this PR.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->