Commits


Sherlock authored and GitHub committed b03fb82ab77
Transformer layer-wise Recompute (#4526) * Build Recomputation Graph * Make topological sort to run FW nodes first * Pattern match start and end of transformer layer * Topological sort with Priority * Add logger to Gradient Graph Builder * Use Logger * Introduce Execution Order