Commits


Zhang Lei authored and GitHub committed 0f8e66d905e
optimization for whisper model with decoder masked multihead attention (#15827) * graph tools update * cuda kernel update * operator spec update and implementation update * greed search bug fix on wrong assumption for cross/self attention input length * avoid use of "" name in value info when loading graph which historically in many model