Commits


Ryan Hill authored and GitHub committed 310273cbe4e
BeamScorer to use contiguous arrays for BeamHypotheses (#15923) ### Description Change BeamHypotheses to not use a stl::priority_queue and instead all BeamHypotheses use a single buffer that they each get a small slice of. As the beam count is really small (typically 4,8, max of 32) and the array size fixed, the BeamHypotheses just does a sorted insert into an array. This also allows for the BeamHypotheses inside of the BeamSearchScorer to be a single fixed allocation vs an onnxruntime::FastAllocVector. ### Motivation and Context The goal is to simplify the memory usage and make the code more easily ported to CUDA.