Commits


Jiajia Qin authored and GitHub committed 87165b92e94
[js/webgpu] optimize MatmulNBits (#21747) ### Description <!-- Describe your changes. --> See 2x speedup for phi3 on the integrated intel gpu with this optimization. The optimization is mainly to store input A's data into local variable instead of loading them from global memory each time when calculate them with B data. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->