Commits


Sheil Kumar authored and GitHub committed 1b7f65437ef
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442) Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP Opset 11 introduced the following sequence related operators: - SequenceAt - SequenceConstruct - SequenceEmpty - SequenceLength - SequenceErase - SequenceInsert - ConcatFromSequence With the exception of ConcatFromSequence, all of the above operators were implemented with CPU kernels that a) required all of the contained tensors to also be on CPU, and b) would clone each tensor into a new sequence as a side effect of each operator. The implementation of sequences are backend agnostic, as they dont affect actual tensor layout or manipulate the contents of the tensors. In addition, with the exception of SequenceAt, the other operators need not make copies of the underlying referenced tensors. Consequently, this change does the following: 1) Sequence* operators (except SequenceAt) no longer copies the contents of a sequence of tensors on every kernel execution. 2) SequenceAt uses the DataTransferManager to copy tensors agnostic to backend. 3) The internal container implemented by TensorSeq has changed from onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor does not support copy or assignment construction, so it must have a singular owner. However, is same tensor participates in multiple containers it would have multiple container "owners" and this would not be possible. 4) Other code that accessed values from TensorSeq have associated changes to extract Tensors from OrtValues now. In addition, DirectML execution was very slow when the above Sequence operators were added to a graph, as this caused MemcpyToHost and MemcpyFromHost kernels to be inserted between the graph and the sequence operators. To optimize DirectML, 1) The CPU implementations for the Sequence* ops were registered as DML implementations. Since the above changes also includes making the CPU kernel implementations EP agnostic, the CPU kernels can be added as is. 2) The ConcatFromSequence operator needed to be implemented on DirectML. However, there was little DirectML EP operator framework support for operators that accept/output sequences of tensors. This change has modified the internal COM interfaces to include new apis to interrogate for sequence shapes, and extract the needed tensors from TensorSeq. --------- Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>