Commits


Chen Fu authored and GitHub committed 90142899bdd
Supporting Intel AMX instructions in quantized GEMM (#14042) ### Description Using Intel AMX int8 instructions to accelerate quantized GEMM ### Motivation and Context AMX instructions accelerate quantized GEMM significantly: Prepacked B perf numbers (latency in ns) GEMM Config | AVX512Vnni | AMX -- | --: | --: M:384/N:1024/K:1024/Batch:1/Threads:4 | 1057511 | 285393 M:384/N:1024/K:3072/Batch:1/Threads:4 | 2643929 | 700397 M:384/N:1024/K:4096/Batch:1/Threads:4 | 3784750 | 890701 M:384/N:4096/K:1024/Batch:1/Threads:4 | 2378139 | 887251 M:384/N:1024/K:1024/Batch:1/Threads:16 | 307137 | 138481 M:384/N:1024/K:3072/Batch:1/Threads:16 | 855730 | 295027 M:384/N:1024/K:4096/Batch:1/Threads:16 | 1126878 | 317395 M:384/N:4096/K:1024/Batch:1/Threads:16 | 781963 | 237014 M:1536/N:1024/K:1024/Batch:1/Threads:16 | 538864 | 181459 M:1536/N:1024/K:3072/Batch:1/Threads:16 | 1681002 | 561600 M:1536/N:1024/K:4096/Batch:1/Threads:16 | 2158127 | 717470 M:1536/N:4096/K:1024/Batch:1/Threads:16 | 2428622 | 896140 M:3072/N:1024/K:1024/Batch:1/Threads:16 | 1058029 | 357031 M:3072/N:1024/K:3072/Batch:1/Threads:16 | 3138504 | 1095857 M:3072/N:1024/K:4096/Batch:1/Threads:16 | 4155640 | 1386183 M:3072/N:4096/K:1024/Batch:1/Threads:16 | 4679030 | 1778624 Co-authored-by: Yi-Hong Lyu <yilyu@microsoft.com> Co-authored-by: Chen Fu <fuchen@microsoft.com>