Commits


Jiajia Qin authored and GitHub committed 80d8931f1d8
[webgpu] Use subgroup for matmulnbits (#23224) ### Description This PR applies subgroup to implement matmulnbits when tile_m > 1 for intel devices. With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from 8.5s on intel Meteor Lake.