Public
  1. Public

onnxruntime

Public
AuthorCommitMessageCommit dateIssues
Alessio SoldanoGitHubAlessio Soldano
30c682547bd[OpenVINO] Fix a build warning (#23877)### Description Fix a warning with std::move usage ### Motivation and Context Possibly allow building without --compile_no_warning_as_error flag
Jiajia QinGitHubJiajia Qin
325ee30916f[js/webgpu] Reland the optimization of ConvTranspose (#23858)This PR fixes the errors in the ConvTranspose optimization and adds tests to ensure the correctness of the implementation.
Yulong WangGitHubYulong Wang
18725277e3a[js/common] allows using Uint16Array as data for float16 tensor (#23827)### Description Resolve #23817 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Jian ChenGitHubJian Chen
7f0c2c644c8Make Nuget QNN package pipeline 1ES compliant (#23805)### Description Make [QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Hector LiGitHubHector Li
99c51a326e0Change the logic to generate the default ep context file name (#23788)Change the logic to generate the default ep context file name ### Description Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name.
Jambay KinleyGitHubJambay Kinley
daf9565d1b5Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (#23856)
co63ocGitHubco63oc
0a6b05fb2dd[doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (#23848)### Description <!-- Describe your changes. --> Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Seungtaek KimGitHubSeungtaek Kim
1ffe793a834Fix typo: change `Upample` to `Upsample`. (#23838)### Description <!-- Describe your changes. --> Fixed a typo in function names related to the Upsample CUDA kernel. Changed incorrect spelling Upample to Upsample across relevant functions. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is necessary to maintain consistency a...
Scott McKayGitHubScott McKay
1088a1edfecModel Builder API (#23223)### Description <!-- Describe your changes. --> Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Sushanth RajasankarGitHubSushanth Rajasankar
1be64f88319Fix flash attention for GQA (Phi4) (#23850)### Description This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause appears to be `k_start + capped_sg_id < seq_causal_length` check. This is either because, a. seq_causal_length varies per lane, so the check becomes non uniform control flow, which is having interactions with subgroupShuffle. or b. The check itself is incorrect and is wiping out values of v based on the s...
Jian ChenGitHubJian Chen
2a4cfab46a8Revert changes onn mac-react-native-ci-pipeline.yml (#23845)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Jing FangGitHubJing Fang
c61a4b115ea[Mlas] Unblock hardcoded matmul blocking size (#23815)### Description In GemmBatch, target matrix is cut into blocks to dispatch to multiple threads for intra-op parallelism. Currently the block size hard-coded to 16. If the CPU has > 16 cores, cores are not fully utilized in one op. This change unblocks the number of blocks in various MatMul. __Benchmark results__ Model: llmlingua-2-bert-base-multilingual-cased-meetingbank--add-force-token-1...
Jian ChenGitHubJian Chen
a189bfca4e7Increase npm package pipeline ReactNative_CI_iOS timeout to 120 mins (#23825)### Description Increase [npm package pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1080&_a=summary) ReactNative_CI_iOS timeout to 120 mins ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Karim VadsariyaGitHubKarim Vadsariya
05642657161[ORT/CI_Pipeline] Use --enable_generic_interface in ORT builds for EP testing (#23801)Summary of changes: - Changed openVINO test case to use --enable_generic_interface - changed tensorRT test case to use --enable_generic_interface - Fixed ORT builds to USE_FULL_PROTOBUF as openVINO/TensorRT requires them - Fixed pre-processor macro definition which accidently got removed when ORT is build w/o EP ### Description <!-- Describe your changes. --> ### Motivation and Context <!--...
Jambay KinleyGitHubJambay Kinley
5ab953cb8c4Quant tool: Add `nodes_to_exclude` in `get_qnn_qdq_config` (#23779)
Changming SunGitHubChangming Sun
b1f2a3f5f3aUpdate onnxruntime_external_deps.cmake: add missing EXCLUDE_FROM_ALL (#23829)### Description To resolve #23821 ### Motivation and Context Similar to #23641 .
Ankit MaheshkarGitHubAnkit Maheshkar
17f39475536[OVEP] Update support for Contrib Ops (#23789)### Description This PR enables Contrib Ops support in OVEP namely below - DynamicQuantizeMatMul, FusedMatMul, QuickGelu, SkipSimplifiedLayerNormalization Co-authored-by: n1harika <niharika.sathish@intel.com>
Yulong WangGitHubYulong Wang
6df0973e58bupgrade emsdk to 4.0.4 (#23819)### Description Upgrade EMSDK to 4.0.4 ### Motivation and Context Emscripten v4.0.4 brings 2 useful changes that are helpful for webgpu: - https://github.com/emscripten-core/emscripten/pull/23678 - https://github.com/emscripten-core/emscripten/pull/23631
Jianhui DaiGitHubJianhui Dai
c6664e20522[webgpu] Fix alignment issues in shader code (#23776)### Description This commit fixes alignment issues in shader code. ### Motivation and Context See above.
Yifan LiGitHubYifan Li
000f2c9f17a[TensorRT EP] update oss parser to latest (#23710)### Description <!-- Describe your changes. --> * Update oss parser version to latest commit of 10.8-GA branch ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Action needed to adapt latest onnx-tensorrt 10.8-GA branch to fix scatterND attribute issue and `plugin.h` not found issue
Jing FangGitHubJing Fang
7a3810d31e6[ARM CPU] Fix flaky hgemmb ut (#23814)### Description Original UT use random seed. Change to fixed seed. ### Motivation and Context Fix flaky UT.
Jian ChenGitHubJian Chen
d5742708600Make Nuget CUDA package pipeline 1ES compliant (#23804)### Description Make [Nuget CUDA 12 Publish Pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1312&_a=summary) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Jian ChenGitHubJian Chen
40c329ef59fUpgrade React Native to 0.73 (#23575)Description Upgrading RN to 0.73.11, including Android and iOS changes.. This PR also include the E2E test changes. Used React-Native upgrade [helper](https://react-native-community.github.io/upgrade-helper/?from=0.72.11&to=0.73.11&package=onnxruntime-android&name=onnxruntime) as the reference. Motivation and Context Need newer RN version to fix S360 work items.
xhcaoGitHubxhcao
cc3f4120402[webgpu] support resize operator (#23780)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Jian ChenGitHubJian Chen
9a2e00906a2Conveting npm packaging pipeline to 1ES (#23767)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Jian ChenGitHubJian Chen
839d9dcd284Make Nuget package pipeline 1ES compliant (#23803)### Description Make[Nuget Publishing](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1313&_a=summary) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
kuanyul-quicGitHubkuanyul-quic
5be82eb7b2b[QNN EP] Re-enable several disabled QNN-EP UTs (#23799)### Description 1. Re-enable UTs which passed 2.30 2. Update resize UT because "round_prefer_floor" is no longer supported in QNN SDK since 2.21. ### Motivation and Context 1. Make the UT of QNN EP pass as much as possible to improve the test coverage. --------- Co-authored-by: Kuan-Yu Lin <kuanyul@qti.qualcomm.com>
genmingz@AMDGitHubgenmingz@AMD
5abb75ff191[VitisAI] add new interfece (#23777)### Description A new interface for interaction between ONNX Runtime and Vitis AI has been added, which uses `std::filesystem::path` to pass paths. ### Motivation and Context Vitis AI uses `std::string` to pass paths, which causes errors on Windows when the model name contains Chinese characters. Therefore, this PR adds an interface that uses `std::filesystem::path` to pass paths, ensuring tha...
Edward ChenGitHubEdward Chen
e46c0d86b9c[QNN EP] Use absolute path of libcdsprpc.dll on Windows so it doesn't need to be copied anywhere. (#23791)### Description Look up and use absolute path of libcdsprpc.dll on Windows. ### Motivation and Context The QNN EP's HTP shared memory allocator requires use of the libcdsprpc shared library. On Windows, this previously required copying libcdsprpc.dll from some driver-specific path to somewhere the running code could find it. After this change, libcdsprpc.dll no longer needs to be copied.
amarin16GitHubamarin16
7864192c9c5Bump version from 1.21 to 1.22 (#23787)The [1.21 release branch](https://github.com/microsoft/onnxruntime/tree/rel-1.21.0) has been cut, so we need to update the version in main from `1.21.0` to `1.22.0`.
Jiajia QinGitHubJiajia Qin
9799c3fbd26[webgpu] Enable FlashAttention for GQA (#23761)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Wanming LinGitHubWanming Lin
50c835d98c1[WebNN] Fix missing parameter (#23778)Missing first parameter when invoking `jsepEnsureTensor`.
Changming SunGitHubChangming Sun
98511b0fe80Set build user's uid when creating Migraphx/ROCM docker images (#23657)### Description Set build user's uid when creating Migraphx/ROCM docker images
Chi LoGitHubChi Lo
23f787ea756[TensorRT EP] Add new provider option to exclude ops from running on TRT (#23705)This PR removes the implicit filtering-out DDS ops from running on TRT. In other words, by default, DDS nodes will be run by TRT if it supports. Moreover, it adds new provider option `trt_op_types_to_exclude`: - User can provide op type list to be excluded from running on TRT - e.g. `trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlignl"` (This PR basically adds back [feature](https:/...
Yifan LiGitHubYifan Li
1b0a2ba431eUpdate cmake_cuda_architecture to control package size (#23671)### Description <!-- Describe your changes. --> Action item: * ~~Add LTO support when cuda 12.8 & Relocatable Device Code (RDC)/separate_compilation are enabled, to reduce potential perf regression~~LTO needs further testing * Reduce nuget/whl package size by selecting devices & their cuda binary/PTX assembly during ORT build; * make sure ORT nuget package < 250 MB, python wheel < 300 MB ...
Sushanth RajasankarGitHubSushanth Rajasankar
8eb5513be6d[webgpu] Implement SubGroupMatrix based MatMulNBits for Metal (#23729)### Description Recent progress with SubGroupMatrix prototype in Dawn https://issues.chromium.org/issues/348702031, exposes SIMD-Group Matrix Functions to webgpu. This shader implements a matmulnbits using that primitive. Observed perf gains, in terms of LLM inference speed, prefill perf for Phi 3.5 for a 1K token prefill see 3x improvement. 5.4s from 15s. With Changes ``` ./model_benchmark -...
Adrian LizarragaGitHubAdrian Lizarraga
d82604e802a[Optimizer] Fix exception for Q -> DQ sequence with different scale types (#23771)### Description Fixes bug in the IsQDQPairSupported utility function, which is used by various QDQ optimizers (e.g., DoubleQDQPairsRemover, QDQFinalCleanup, etc.). The bug causes an exception when IsQDQPairIsSupported() is called with a `Q(scale_f32) -> DQ(scale_f16)` sequence that uses different scale types. ### Motivation and Context Fix bug that prevents creating QDQ models that use scale...
saurabhGitHubsaurabh
754ee21f835OVEP: Bug Fixes, Refactoring, and Contrib Ops Update (#23742)### Description This pull request combines multiple improvements, bug fixes for the OpenVINO Execution Provider (OVEP). The changes are summarized as follows: 1. Support for various contrib Ops in OVEP. 2. Dimension Check Fixes for Greater, Pad, and MAX Ops: Fixed dimension check failures for the Greater, Pad, and MAX ops in OVEP, ensuring they now pass validation for all supported models. 3...
Jambay KinleyGitHubJambay Kinley
6715d4ca35eShape inference: GatherBlockQuantized dispatcher (#23748)### Description Add shape infer dispatcher for `GatherBlockQuantized` contrib op. It reuses the dispatcher for `Gather` op since the first two inputs have the same specs. The output elem type comes from input 2 (scales) for `GatherBlockQuantized`. ### Motivation and Context Support shape inference for models with `GatherBlockQuantized` op.
Jon CampbellGitHubJon Campbell
75cf166b25b[QNN EP] Passthrough EP Parameters in Node (#23468)### Description The existing implementation of session options for the QNN EP does not honor the various bindings available. As such, even if set at runtime they are ignored. Fix is to follow the pattern of the `webgpu` provider and parse/populate the options accordingly. Existing defaults are preserved, such that if options are not set the prior behavior will persist. ### Motivation and Cont...
Prathik RaoGitHubPrathik Rao
eadd29e64bb[JSEP] fix scatter-nd jsep kernel (#23755)Adjusts scatter-nd kernel implementation for the case when reduction=none and there are duplicate values in the indices input tensor. If duplicates are detected, a single thread processes all indices to ensure correct results.
Karim VadsariyaGitHubKarim Vadsariya
0babb10a277[onnxruntime/build] Add CI testing for ORT build with generic interface (#23530)[onnxruntime/build] Add CI testing for ORT build with generic interface Summary: - Remove unused cmake variables - Add target specific logic when generic interface is used. - Add QNN EP test case that use ORT generic interface build
liqun FuGitHubliqun Fu
af04b202bafRope imbedding kernel to use avx2 (#23694)### Description <!-- Describe your changes. --> Credit to [chethanpk](https://github.com/chethanpk) who provided with Rope Embedding in a patch. The patch is in the first commit of this PR. I have been confirming perf improvement with this code change. My analysis is based on phi-3-mini-4k-instruct-int4-int8-blklen32. Benchmark from onnxruntim-genai does not show clear improvement. this is bec...
Changming SunGitHubChangming Sun
3df43a247ffAdd a new build flag to build.py for using with vcpkg (#23723)1. **Add new flag to build.py**: Introduced a `--use_vcpkg_ms_internal_asset_cache` flag to `build.py`. The flag is intended for internal use only. 2. **Reduce excessive logs**: Removed some excessive logs from `vcpkg_helper.py`.
Dmitri SmirnovGitHubDmitri Smirnov
b230c7bc101Capacity aware partitioning (#22766)### Description Allow users to specify per EP specific resource constraints. Currently, models that do not fit into device memory error out. This PR lays groundwork for EP specific resource constrained graph partitioning, subject to incremental feature additions. Partitioning in this context means to assign graph nodes to a specific device (Execution Provider) up to a certain limit that is ev...
Jing FangGitHubJing Fang
2d33ee91556[ARM CPU] Enable FP16 kernels for GQA op (#23746)### Description - Enable hgemm and softmax fp16 kernels for GQA - add intra-loop parallelism to RoPE fp16 kernel __Benchmarking models__ - float32: [phi-3 cpu accuracy level 0](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/tree/main/cpu_and_mobile/cpu-int4-rtn-block-32) - float16: [phi-3 gpu accuracy level 0](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/tree/ma...
Jiajia QinGitHubJiajia Qin
9b2b2ee8269[webgpu] Use components for VxAttentionScore (#23726)For phi3.5-gqa-static sum_long(>1000 tokens) on meteor lake. Before: 300 tokens in 27.0sec, e2e:11.1 tps, prompt: 212.4 tps, gen: 14.2 tps, ttft: 5.85 sec After: 300 tokens in 23.0sec, e2e:13.0 tps, prompt: 248.9 tps, gen: 16.6 tps, ttft: 4.99 sec
Ranjit RanjanGitHubRanjit Ranjan
60362106a08[AIX]eigen update fix and test failures fix (#23751)### Description Changes in this PR are for: - Cleanup the patch for Eigen on AIX. Not needed anymore . - Fix to recent test failures ``` 1: [----------] Global test environment tear-down 1: [==========] 4737 tests from 310 test suites ran. (94682 ms total) 1: [ PASSED ] 4733 tests. 1: [ SKIPPED ] 2 tests, listed below: 1: [ SKIPPED ] MatMulFpQ4.MatMul2DSym 1: [ SKIPPED ] MatMulFpQ4.MatMu...
Yifan LiGitHubYifan Li
ec3f8718ad3Add condition to gpu wheel build flag (#23760)### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
kunal-vaishnaviGitHubkunal-vaishnavi
6d2b36e18feFix security vulnerability with Whisper export (#23743)### Description This PR reverts changes from [this PR](https://github.com/microsoft/onnxruntime/pull/15759/files). ### Motivation and Context This fixes a security vulnerability that was raised internally.