Commits


Jing Fang authored and GitHub committed 02dc270242c
[ARM CPU] add notrans hgemm mlas kernel (#23668) ### Description add notrans hgemm mlas kernel for arm cpu. optimized for large K and small N. | Test | M | N | K | HGEMM time (ns) | SGEMM time (ns) | HGEMM speed up % | |-----------------|------|-------|-------|-----------------|-----------------|------------------| | LLM | 1 | 4096 | 4096 | 446793 | 1579150 | 71.71 | | LLM | 1024 | 4096 | 4096 | 100206500 | 115864382 | 13.51 | | LLM | 2048 | 4096 | 4096 | 201124807 | 257143151 | 21.78 | | LLM | 1 | 11008 | 4096 | 1270891 | 4310119 | 70.51 | | LLM | 1024 | 11008 | 4096 | 267071834 | 320892617 | 16.77 | | LLM | 2048 | 11008 | 4096 | 537345913 | 755739716 | 28.90 | | LLM | 1 | 4096 | 11008 | 1452455 | 3632642 | 60.02 | | LLM | 1024 | 4096 | 11008 | 281601378 | 326769587 | 13.82 | | LLM | 2048 | 4096 | 11008 | 562710674 | 704394097 | 20.11 | | LLM | 1 | 11008 | 11008 | 3695318 | 9442217 | 60.86 | | LLM | 1024 | 11008 | 11008 | 756445906 | 872947830 | 13.35 | | LLM | 2048 | 11008 | 11008 | 1521540547 | 1871241874 | 18.69 | ### Motivation and Context used in gqa value calculation