Commits


Yufeng Li authored and GitHub committed 8824f812e07
optimize topk for greedysearch (#14271) Optimize top 1 computation in greedysearch. For vocabulary size 50k on A100, - batch size 1: from 220us to 10.4us. - batch size 4, from 230us to 11.5us. For generation of 50 tokens for example, it saves 50*0.2ms = 10ms.