Commits


Suffian Khan authored and GitHub committed 225439193e4
Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833) * special case concat and split when sizes are equal * add tests for 16 and 32 inputs with same dim * add tests for 16/64 inputs on concat or 16/64 outputs on split * try eliminate windows warning * outter => outer