Public / onnxruntime / 225439193e4

Skip to sidebar navigation
Skip to content

Commits

Suffian Khan authored and GitHub committed 225439193e402 Sep 2021

Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833)

* special case concat and split when sizes are equal

* add tests for 16 and 32 inputs with same dim

* add tests for 16/64 inputs on concat or 16/64 outputs on split

* try eliminate windows warning

* outter => outer