Commits


Yufeng Li authored and GitHub committed 8de885fdb1b
reduce cuda library binary size (#14555) ### Description Reduce the cuda library size by: 1. refactoring beam_search_top_k to reduce template instantiation. It saves ~56MB 2. opt out TopK for type uint*, int8_t and int16_t. It saves ~50MB. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->