Public / onnxruntime / 8de885fdb1b

Commits

Yufeng Li authored and GitHub committed 8de885fdb1b08 Feb 2023

reduce cuda library binary size (#14555)

### Description
Reduce the cuda library size by:
1. refactoring beam_search_top_k to reduce template instantiation. It
saves ~56MB
2. opt out TopK for type uint*, int8_t and int16_t. It saves ~50MB.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->