Commits


Zhang Lei authored and GitHub committed 910fc09de2a
Using standard layernorm cuda kernel for skiplayernorm. (#15076) * Current SkipLayernorm did not using stable algo and cause correctness issue. * Enrich existing layernorm kernel to accept bias and residual. * Tune standard layernorm threads.y according to elements and device property. * Remove existing skiplayernorm cuda implementation.