Public / onnxruntime / 8159723ba7e

Commits

Jiajia Qin authored and GitHub committed 8159723ba7e15 Oct 2024

[js/webgpu] Optimize matmulnbits (#22360)

### Description
<!-- Describe your changes. -->
This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo
becomes ~12 tokens/second from ~8 tokens on iGPUs.

Some todos:
1. Make the optimization more general, Remove the blockSize = 32
limitation.
2. Tune the parameter, such as workgroupSize, components size (currently
only support components = 1), to see the performance change.