Commits


Ye Wang authored and GitHub committed 4e670f7ab1b
Support larger hidden size in Attention Cuda kernel (#7002) * Support larger hidden size in Attention Cuda kernel * Update attention_transpose.cu * review comments * fix typo and add check in quantization * update readme