Commits


Sheil Kumar authored and GitHub committed ef0b71308c0
Optimize KahnsTopologicalSort and PriorityNodeCompare (#19475) **Description** 1) During SessionInitialization, KahnsTopologicalSort is a major cause of perf degradation. The main cause of slow down is that the TopologicalSort needs to keep track of nodes to visit in order, and reorder them based on priority (as informed by a comparator). The existing implementation uses a priority_queue that is backed by a std::vector container. However, vectors are not good for insertion and reordering. The appropriate data type for this operation is a linked list. However, linked lists like std::list are not usable as a container for std::priority_queue. This is because std::priority_queue requires random access, which linked lists do not have. However, for this simple implementation, we can leverage a std::list under the hood and perform insertions manually using std::upper_bound. This drastically reduces the time taken by the method, which currently instead causes numerous recopies and a lot of movement inside the graph nodes to visit list. 2) In the comparator, I hide forward and backward attribute checking behind the #ifdef ENABLE_TRAINING macro, as I believe it should only be valid in the training scenario. 3) In noopelimination transformer, I prevent the creation of Initializer (which unpacks tensorproto data) in every node and only create initializers when Add/Sub/Mul/Div op nodes are detected. **Motivation and Context** Session creation time of many models is quite slow. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>