Commits


Vincent Wang authored and GitHub committed 6900109ee88
Bugfix for GetCpuPreferredNodes (#13590) GetCpuPreferredNodes is a function to get CPU preferred nodes from a graph for target EP (such as CUDA). It starts from CPU outputs of target EP node and travel the graph and try to fallback tentative nodes from target EP to CPU EP. For example: Shape->Gather->Concat->Reshape, at the beginning, all these 4 nodes are all tentative nodes. Since output of Shape is CPU output, it starts from that output and travel the graph, and fallback Gather and Concat to CPU EP. Reshape cannot fallback because its another input is not CPU input. But for case: Shape->Gather->ReduceProd->Concat->Reshape, since ReduceProd doesn't have int64_t kernel in target EP (CUDA here), so it's not a tentative node. The travelling logic still starts from Shape's output, but with current logic, it will stop when reaching ReduceProd, so that Concat will not fallback at the end and is assigned with target EP, at the end, Memcpy nodes are added before and after the Concat node because both of its input and output are CPU tensors. This PR is to fix this issue. For above case, since ReduceProd is not a tentative node, it means either is already have EP assigned, or there is no kernel found of target EP for it, so we can still continue the graph travelling and make it a CPU node and all its outputs CPU outputs.