Public / onnxruntime

Author	Commit	Message	Commit date
Alessio Soldano	30c682547bd	[OpenVINO] Fix a build warning (#23877)### Description Fix a warning with std::move usage ### Motivation and Context Possibly allow building without --compile_no_warning_as_error flag	04 Mar 2025
Jiajia Qin	325ee30916f	[js/webgpu] Reland the optimization of ConvTranspose (#23858)This PR fixes the errors in the ConvTranspose optimization and adds tests to ensure the correctness of the implementation.	04 Mar 2025
Yulong Wang	18725277e3a	[js/common] allows using Uint16Array as data for float16 tensor (#23827)### Description Resolve #23817 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	04 Mar 2025
Jian Chen	7f0c2c644c8	Make Nuget QNN package pipeline 1ES compliant (#23805)### Description Make [QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	04 Mar 2025
Hector Li	99c51a326e0	Change the logic to generate the default ep context file name (#23788)Change the logic to generate the default ep context file name ### Description Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name.	03 Mar 2025
Jambay Kinley	daf9565d1b5	Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (#23856)	02 Mar 2025
co63oc	0a6b05fb2dd	[doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (#23848)### Description <!-- Describe your changes. --> Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	01 Mar 2025
Seungtaek Kim	1ffe793a834	Fix typo: change `Upample` to `Upsample`. (#23838)### Description <!-- Describe your changes. --> Fixed a typo in function names related to the Upsample CUDA kernel. Changed incorrect spelling Upample to Upsample across relevant functions. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is necessary to maintain consistency a...	01 Mar 2025
Scott McKay	1088a1edfec	Model Builder API (#23223)### Description <!-- Describe your changes. --> Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	01 Mar 2025
Sushanth Rajasankar	1be64f88319	Fix flash attention for GQA (Phi4) (#23850)### Description This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause appears to be `k_start + capped_sg_id < seq_causal_length` check. This is either because, a. seq_causal_length varies per lane, so the check becomes non uniform control flow, which is having interactions with subgroupShuffle. or b. The check itself is incorrect and is wiping out values of v based on the s...	01 Mar 2025
Jian Chen	2a4cfab46a8	Revert changes onn mac-react-native-ci-pipeline.yml (#23845)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	28 Feb 2025
Jing Fang	c61a4b115ea	[Mlas] Unblock hardcoded matmul blocking size (#23815)### Description In GemmBatch, target matrix is cut into blocks to dispatch to multiple threads for intra-op parallelism. Currently the block size hard-coded to 16. If the CPU has > 16 cores, cores are not fully utilized in one op. This change unblocks the number of blocks in various MatMul. __Benchmark results__ Model: llmlingua-2-bert-base-multilingual-cased-meetingbank--add-force-token-1...	28 Feb 2025
Jian Chen	a189bfca4e7	Increase npm package pipeline ReactNative_CI_iOS timeout to 120 mins (#23825)### Description Increase [npm package pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1080&_a=summary) ReactNative_CI_iOS timeout to 120 mins ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	28 Feb 2025
Karim Vadsariya	05642657161	[ORT/CI_Pipeline] Use --enable_generic_interface in ORT builds for EP testing (#23801)Summary of changes: - Changed openVINO test case to use --enable_generic_interface - changed tensorRT test case to use --enable_generic_interface - Fixed ORT builds to USE_FULL_PROTOBUF as openVINO/TensorRT requires them - Fixed pre-processor macro definition which accidently got removed when ORT is build w/o EP ### Description <!-- Describe your changes. --> ### Motivation and Context <!--...	28 Feb 2025
Jambay Kinley	5ab953cb8c4	Quant tool: Add `nodes_to_exclude` in `get_qnn_qdq_config` (#23779)	28 Feb 2025
Changming Sun	b1f2a3f5f3a	Update onnxruntime_external_deps.cmake: add missing EXCLUDE_FROM_ALL (#23829)### Description To resolve #23821 ### Motivation and Context Similar to #23641 .	27 Feb 2025
Ankit Maheshkar	17f39475536	[OVEP] Update support for Contrib Ops (#23789)### Description This PR enables Contrib Ops support in OVEP namely below - DynamicQuantizeMatMul, FusedMatMul, QuickGelu, SkipSimplifiedLayerNormalization Co-authored-by: n1harika <niharika.sathish@intel.com>	27 Feb 2025
Yulong Wang	6df0973e58b	upgrade emsdk to 4.0.4 (#23819)### Description Upgrade EMSDK to 4.0.4 ### Motivation and Context Emscripten v4.0.4 brings 2 useful changes that are helpful for webgpu: - https://github.com/emscripten-core/emscripten/pull/23678 - https://github.com/emscripten-core/emscripten/pull/23631	27 Feb 2025
Jianhui Dai	c6664e20522	[webgpu] Fix alignment issues in shader code (#23776)### Description This commit fixes alignment issues in shader code. ### Motivation and Context See above.	27 Feb 2025
Yifan Li	000f2c9f17a	[TensorRT EP] update oss parser to latest (#23710)### Description <!-- Describe your changes. --> * Update oss parser version to latest commit of 10.8-GA branch ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Action needed to adapt latest onnx-tensorrt 10.8-GA branch to fix scatterND attribute issue and `plugin.h` not found issue	27 Feb 2025
Jing Fang	7a3810d31e6	[ARM CPU] Fix flaky hgemmb ut (#23814)### Description Original UT use random seed. Change to fixed seed. ### Motivation and Context Fix flaky UT.	27 Feb 2025
Jian Chen	d5742708600	Make Nuget CUDA package pipeline 1ES compliant (#23804)### Description Make [Nuget CUDA 12 Publish Pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1312&_a=summary) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	27 Feb 2025
Jian Chen	40c329ef59f	Upgrade React Native to 0.73 (#23575)Description Upgrading RN to 0.73.11, including Android and iOS changes.. This PR also include the E2E test changes. Used React-Native upgrade [helper](https://react-native-community.github.io/upgrade-helper/?from=0.72.11&to=0.73.11&package=onnxruntime-android&name=onnxruntime) as the reference. Motivation and Context Need newer RN version to fix S360 work items.	27 Feb 2025
xhcao	cc3f4120402	[webgpu] support resize operator (#23780)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	27 Feb 2025
Jian Chen	9a2e00906a2	Conveting npm packaging pipeline to 1ES (#23767)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	27 Feb 2025
Jian Chen	839d9dcd284	Make Nuget package pipeline 1ES compliant (#23803)### Description Make[Nuget Publishing](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1313&_a=summary) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	27 Feb 2025
kuanyul-quic	5be82eb7b2b	[QNN EP] Re-enable several disabled QNN-EP UTs (#23799)### Description 1. Re-enable UTs which passed 2.30 2. Update resize UT because "round_prefer_floor" is no longer supported in QNN SDK since 2.21. ### Motivation and Context 1. Make the UT of QNN EP pass as much as possible to improve the test coverage. --------- Co-authored-by: Kuan-Yu Lin <kuanyul@qti.qualcomm.com>	27 Feb 2025
genmingz@AMD	5abb75ff191	[VitisAI] add new interfece (#23777)### Description A new interface for interaction between ONNX Runtime and Vitis AI has been added, which uses `std::filesystem::path` to pass paths. ### Motivation and Context Vitis AI uses `std::string` to pass paths, which causes errors on Windows when the model name contains Chinese characters. Therefore, this PR adds an interface that uses `std::filesystem::path` to pass paths, ensuring tha...	26 Feb 2025
Edward Chen	e46c0d86b9c	[QNN EP] Use absolute path of libcdsprpc.dll on Windows so it doesn't need to be copied anywhere. (#23791)### Description Look up and use absolute path of libcdsprpc.dll on Windows. ### Motivation and Context The QNN EP's HTP shared memory allocator requires use of the libcdsprpc shared library. On Windows, this previously required copying libcdsprpc.dll from some driver-specific path to somewhere the running code could find it. After this change, libcdsprpc.dll no longer needs to be copied.	24 Feb 2025
amarin16	7864192c9c5	Bump version from 1.21 to 1.22 (#23787)The [1.21 release branch](https://github.com/microsoft/onnxruntime/tree/rel-1.21.0) has been cut, so we need to update the version in main from `1.21.0` to `1.22.0`.	23 Feb 2025
Jiajia Qin	9799c3fbd26	[webgpu] Enable FlashAttention for GQA (#23761)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	22 Feb 2025
Wanming Lin	50c835d98c1	[WebNN] Fix missing parameter (#23778)Missing first parameter when invoking `jsepEnsureTensor`.	22 Feb 2025
Changming Sun	98511b0fe80	Set build user's uid when creating Migraphx/ROCM docker images (#23657)### Description Set build user's uid when creating Migraphx/ROCM docker images	22 Feb 2025
Chi Lo	23f787ea756	[TensorRT EP] Add new provider option to exclude ops from running on TRT (#23705)This PR removes the implicit filtering-out DDS ops from running on TRT. In other words, by default, DDS nodes will be run by TRT if it supports. Moreover, it adds new provider option `trt_op_types_to_exclude`: - User can provide op type list to be excluded from running on TRT - e.g. `trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlignl"` (This PR basically adds back [feature](https:/...	22 Feb 2025
Yifan Li	1b0a2ba431e	Update cmake_cuda_architecture to control package size (#23671)### Description <!-- Describe your changes. --> Action item: * ~~Add LTO support when cuda 12.8 & Relocatable Device Code (RDC)/separate_compilation are enabled, to reduce potential perf regression~~LTO needs further testing * Reduce nuget/whl package size by selecting devices & their cuda binary/PTX assembly during ORT build; * make sure ORT nuget package < 250 MB, python wheel < 300 MB ...	22 Feb 2025
Sushanth Rajasankar	8eb5513be6d	[webgpu] Implement SubGroupMatrix based MatMulNBits for Metal (#23729)### Description Recent progress with SubGroupMatrix prototype in Dawn https://issues.chromium.org/issues/348702031, exposes SIMD-Group Matrix Functions to webgpu. This shader implements a matmulnbits using that primitive. Observed perf gains, in terms of LLM inference speed, prefill perf for Phi 3.5 for a 1K token prefill see 3x improvement. 5.4s from 15s. With Changes ``` ./model_benchmark -...	22 Feb 2025
Adrian Lizarraga	d82604e802a	[Optimizer] Fix exception for Q -> DQ sequence with different scale types (#23771)### Description Fixes bug in the IsQDQPairSupported utility function, which is used by various QDQ optimizers (e.g., DoubleQDQPairsRemover, QDQFinalCleanup, etc.). The bug causes an exception when IsQDQPairIsSupported() is called with a `Q(scale_f32) -> DQ(scale_f16)` sequence that uses different scale types. ### Motivation and Context Fix bug that prevents creating QDQ models that use scale...	22 Feb 2025
saurabh	754ee21f835	OVEP: Bug Fixes, Refactoring, and Contrib Ops Update (#23742)### Description This pull request combines multiple improvements, bug fixes for the OpenVINO Execution Provider (OVEP). The changes are summarized as follows: 1. Support for various contrib Ops in OVEP. 2. Dimension Check Fixes for Greater, Pad, and MAX Ops: Fixed dimension check failures for the Greater, Pad, and MAX ops in OVEP, ensuring they now pass validation for all supported models. 3...	21 Feb 2025
Jambay Kinley	6715d4ca35e	Shape inference: GatherBlockQuantized dispatcher (#23748)### Description Add shape infer dispatcher for `GatherBlockQuantized` contrib op. It reuses the dispatcher for `Gather` op since the first two inputs have the same specs. The output elem type comes from input 2 (scales) for `GatherBlockQuantized`. ### Motivation and Context Support shape inference for models with `GatherBlockQuantized` op.	21 Feb 2025
Jon Campbell	75cf166b25b	[QNN EP] Passthrough EP Parameters in Node (#23468)### Description The existing implementation of session options for the QNN EP does not honor the various bindings available. As such, even if set at runtime they are ignored. Fix is to follow the pattern of the `webgpu` provider and parse/populate the options accordingly. Existing defaults are preserved, such that if options are not set the prior behavior will persist. ### Motivation and Cont...	21 Feb 2025
Prathik Rao	eadd29e64bb	[JSEP] fix scatter-nd jsep kernel (#23755)Adjusts scatter-nd kernel implementation for the case when reduction=none and there are duplicate values in the indices input tensor. If duplicates are detected, a single thread processes all indices to ensure correct results.	21 Feb 2025
Karim Vadsariya	0babb10a277	[onnxruntime/build] Add CI testing for ORT build with generic interface (#23530)[onnxruntime/build] Add CI testing for ORT build with generic interface Summary: - Remove unused cmake variables - Add target specific logic when generic interface is used. - Add QNN EP test case that use ORT generic interface build	21 Feb 2025
liqun Fu	af04b202baf	Rope imbedding kernel to use avx2 (#23694)### Description <!-- Describe your changes. --> Credit to [chethanpk](https://github.com/chethanpk) who provided with Rope Embedding in a patch. The patch is in the first commit of this PR. I have been confirming perf improvement with this code change. My analysis is based on phi-3-mini-4k-instruct-int4-int8-blklen32. Benchmark from onnxruntim-genai does not show clear improvement. this is bec...	21 Feb 2025
Changming Sun	3df43a247ff	Add a new build flag to build.py for using with vcpkg (#23723)1. Add new flag to build.py: Introduced a `--use_vcpkg_ms_internal_asset_cache` flag to `build.py`. The flag is intended for internal use only. 2. Reduce excessive logs: Removed some excessive logs from `vcpkg_helper.py`.	21 Feb 2025
Dmitri Smirnov	b230c7bc101	Capacity aware partitioning (#22766)### Description Allow users to specify per EP specific resource constraints. Currently, models that do not fit into device memory error out. This PR lays groundwork for EP specific resource constrained graph partitioning, subject to incremental feature additions. Partitioning in this context means to assign graph nodes to a specific device (Execution Provider) up to a certain limit that is ev...	21 Feb 2025
Jing Fang	2d33ee91556	[ARM CPU] Enable FP16 kernels for GQA op (#23746)### Description - Enable hgemm and softmax fp16 kernels for GQA - add intra-loop parallelism to RoPE fp16 kernel __Benchmarking models__ - float32: [phi-3 cpu accuracy level 0](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/tree/main/cpu_and_mobile/cpu-int4-rtn-block-32) - float16: [phi-3 gpu accuracy level 0](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/tree/ma...	21 Feb 2025
Jiajia Qin	9b2b2ee8269	[webgpu] Use components for VxAttentionScore (#23726)For phi3.5-gqa-static sum_long(>1000 tokens) on meteor lake. Before: 300 tokens in 27.0sec, e2e:11.1 tps, prompt: 212.4 tps, gen: 14.2 tps, ttft: 5.85 sec After: 300 tokens in 23.0sec, e2e:13.0 tps, prompt: 248.9 tps, gen: 16.6 tps, ttft: 4.99 sec	21 Feb 2025
Ranjit Ranjan	60362106a08	[AIX]eigen update fix and test failures fix (#23751)### Description Changes in this PR are for: - Cleanup the patch for Eigen on AIX. Not needed anymore . - Fix to recent test failures ``` 1: [----------] Global test environment tear-down 1: [==========] 4737 tests from 310 test suites ran. (94682 ms total) 1: [ PASSED ] 4733 tests. 1: [ SKIPPED ] 2 tests, listed below: 1: [ SKIPPED ] MatMulFpQ4.MatMul2DSym 1: [ SKIPPED ] MatMulFpQ4.MatMu...	20 Feb 2025
Yifan Li	ec3f8718ad3	Add condition to gpu wheel build flag (#23760)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	20 Feb 2025
kunal-vaishnavi	6d2b36e18fe	Fix security vulnerability with Whisper export (#23743)### Description This PR reverts changes from [this PR](https://github.com/microsoft/onnxruntime/pull/15759/files). ### Motivation and Context This fixes a security vulnerability that was raised internally.	20 Feb 2025

onnxruntime

Commits