Commits


Jiajia Qin authored and GitHub committed 7c782f67417
[webgpu] Always use tile matmulnbits for block_size = 32 (#23140) ### Description After the optimization of prefill time with #23102, it seems that always using the tile matmulnibits with block_size = 32 can bring better performance even for discrete gpu for phi3 model. Phi3 becomes 42.64 tokens/sec from 32.82 tokens/sec in easy mode on my NV RTX 2000 GPU.