Public / onnxruntime / 7c782f67417

Commits

Jiajia Qin authored and GitHub committed 7c782f6741720 Dec 2024

[webgpu] Always use tile matmulnbits for block_size = 32 (#23140)

### Description
After the optimization of prefill time with #23102, it seems that always
using the tile matmulnibits with block_size = 32 can bring better
performance even for discrete gpu for phi3 model.

Phi3 becomes 42.64 tokens/sec from 32.82 tokens/sec in easy mode on my
NV RTX 2000 GPU.