Commits


Dmitri Smirnov authored and GitHub committed bbedf2c4c5a
Improve cache locality and perf of DeepGru on CPU (#13582) ### Description <!-- Describe your changes. --> Introduce Gemm weights pre-pack. ### Motivation and Context A 1-P customer requested a performance improvement for DeepGru which consumes a bulk of CPU in their model. This provides measurable performance improvements. Customer model numbers. gru: mean = 356 us; 1ms = 99.8 prctile; 99th prctile = 665 ms (yuslepukhin/deep_gru_opt) main: mean = 375 us; 1ms = 99.8 prctile; 99th prctile = 695 ms (where yuslepukhin/deep_gru_opt branched off main) 1.13.1: mean = 391 us; 1ms = 99.6 prctile; 99th prctile = 744 ms