Public / onnxruntime / 8824f812e07

Skip to sidebar navigation
Skip to content

Commits

Yufeng Li authored and GitHub committed 8824f812e0714 Jan 2023

optimize topk for greedysearch (#14271)

Optimize top 1 computation in greedysearch.
For vocabulary size 50k on A100,
- batch size 1: from 220us to 10.4us.
- batch size 4, from 230us to 11.5us.
For generation of 50 tokens for example, it saves 50*0.2ms = 10ms.