[OpenVINO] Fix a build warning (#23877)### Description
Fix a warning with std::move usage
### Motivation and Context
Possibly allow building without --compile_no_warning_as_error flag
[js/webgpu] Reland the optimization of ConvTranspose (#23858)This PR fixes the errors in the ConvTranspose optimization and adds
tests to ensure the correctness of the implementation.
[js/common] allows using Uint16Array as data for float16 tensor (#23827)### Description
Resolve #23817
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make Nuget QNN package pipeline 1ES compliant (#23805)### Description
Make
[QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES
compliant
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Change the logic to generate the default ep context file name (#23788)Change the logic to generate the default ep context file name
### Description
Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name.
[doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (#23848)### Description
<!-- Describe your changes. -->
Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix typo: change `Upample` to `Upsample`. (#23838)### Description
<!-- Describe your changes. -->
Fixed a typo in function names related to the Upsample CUDA kernel.
Changed incorrect spelling Upample to Upsample across relevant
functions.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This change is necessary to maintain consistency a...
Model Builder API (#23223)### Description
<!-- Describe your changes. -->
Supports creating a model programmatically using the ORT C or C++ API.
Supports augmenting an existing model to add nodes.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix flash attention for GQA (Phi4) (#23850)### Description
This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause
appears to be
`k_start + capped_sg_id < seq_causal_length`
check. This is either because,
a. seq_causal_length varies per lane, so the check becomes non uniform
control flow, which is having interactions with subgroupShuffle.
or
b. The check itself is incorrect and is wiping out values of v based on
the s...
Revert changes onn mac-react-native-ci-pipeline.yml (#23845)### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[Mlas] Unblock hardcoded matmul blocking size (#23815)### Description
In GemmBatch, target matrix is cut into blocks to dispatch to multiple
threads for intra-op parallelism.
Currently the block size hard-coded to 16. If the CPU has > 16 cores,
cores are not fully utilized in one op.
This change unblocks the number of blocks in various MatMul.
__Benchmark results__
Model:
llmlingua-2-bert-base-multilingual-cased-meetingbank--add-force-token-1...
Increase npm package pipeline ReactNative_CI_iOS timeout to 120 mins (#23825)### Description
Increase [npm package
pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1080&_a=summary)
ReactNative_CI_iOS timeout to 120 mins
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[ORT/CI_Pipeline] Use --enable_generic_interface in ORT builds for EP testing (#23801)Summary of changes:
- Changed openVINO test case to use --enable_generic_interface
- changed tensorRT test case to use --enable_generic_interface
- Fixed ORT builds to USE_FULL_PROTOBUF as openVINO/TensorRT requires
them
- Fixed pre-processor macro definition which accidently got removed when
ORT is build w/o EP
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!--...
Update onnxruntime_external_deps.cmake: add missing EXCLUDE_FROM_ALL (#23829)### Description
To resolve #23821
### Motivation and Context
Similar to #23641 .
upgrade emsdk to 4.0.4 (#23819)### Description
Upgrade EMSDK to 4.0.4
### Motivation and Context
Emscripten v4.0.4 brings 2 useful changes that are helpful for webgpu:
- https://github.com/emscripten-core/emscripten/pull/23678
- https://github.com/emscripten-core/emscripten/pull/23631
[webgpu] Fix alignment issues in shader code (#23776)### Description
This commit fixes alignment issues in shader code.
### Motivation and Context
See above.
[TensorRT EP] update oss parser to latest (#23710)### Description
<!-- Describe your changes. -->
* Update oss parser version to latest commit of 10.8-GA branch
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
* Action needed to adapt latest onnx-tensorrt 10.8-GA branch to fix
scatterND attribute issue and `plugin.h` not found issue
Make Nuget CUDA package pipeline 1ES compliant (#23804)### Description
Make [Nuget CUDA 12 Publish
Pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1312&_a=summary)
1ES compliant
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Upgrade React Native to 0.73 (#23575)Description
Upgrading RN to 0.73.11, including Android and iOS changes.. This PR
also include the E2E test changes.
Used React-Native upgrade
[helper](https://react-native-community.github.io/upgrade-helper/?from=0.72.11&to=0.73.11&package=onnxruntime-android&name=onnxruntime)
as the reference.
Motivation and Context
Need newer RN version to fix S360 work items.
[webgpu] support resize operator (#23780)### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Conveting npm packaging pipeline to 1ES (#23767)### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make Nuget package pipeline 1ES compliant (#23803)### Description
Make[Nuget
Publishing](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1313&_a=summary)
1ES compliant
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[QNN EP] Re-enable several disabled QNN-EP UTs (#23799)### Description
1. Re-enable UTs which passed 2.30
2. Update resize UT because "round_prefer_floor" is no longer supported in QNN SDK since 2.21.
### Motivation and Context
1. Make the UT of QNN EP pass as much as possible to improve the test
coverage.
---------
Co-authored-by: Kuan-Yu Lin <kuanyul@qti.qualcomm.com>
[VitisAI] add new interfece (#23777)### Description
A new interface for interaction between ONNX Runtime and Vitis AI has been added, which uses `std::filesystem::path` to pass paths.
### Motivation and Context
Vitis AI uses `std::string` to pass paths, which causes errors on Windows when the model name contains Chinese characters. Therefore, this PR adds an interface that uses `std::filesystem::path` to pass paths, ensuring tha...
[QNN EP] Use absolute path of libcdsprpc.dll on Windows so it doesn't need to be copied anywhere. (#23791)### Description
Look up and use absolute path of libcdsprpc.dll on Windows.
### Motivation and Context
The QNN EP's HTP shared memory allocator requires use of the libcdsprpc shared library.
On Windows, this previously required copying libcdsprpc.dll from some driver-specific path to somewhere the running code could find it. After this change, libcdsprpc.dll no longer needs to be copied.
Bump version from 1.21 to 1.22 (#23787)The [1.21 release
branch](https://github.com/microsoft/onnxruntime/tree/rel-1.21.0) has
been cut, so we need to update the version in main from `1.21.0` to
`1.22.0`.
[webgpu] Enable FlashAttention for GQA (#23761)### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[TensorRT EP] Add new provider option to exclude ops from running on TRT (#23705)This PR removes the implicit filtering-out DDS ops from running on TRT.
In other words, by default, DDS nodes will be run by TRT if it supports.
Moreover, it adds new provider option `trt_op_types_to_exclude`:
- User can provide op type list to be excluded from running on TRT
- e.g. `trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlignl"`
(This PR basically adds back
[feature](https:/...
Update cmake_cuda_architecture to control package size (#23671)### Description
<!-- Describe your changes. -->
Action item:
* ~~Add LTO support when cuda 12.8 & Relocatable Device Code
(RDC)/separate_compilation are enabled, to reduce potential perf
regression~~LTO needs further testing
* Reduce nuget/whl package size by selecting devices & their cuda
binary/PTX assembly during ORT build;
* make sure ORT nuget package < 250 MB, python wheel < 300 MB
...
[webgpu] Implement SubGroupMatrix based MatMulNBits for Metal (#23729)### Description
Recent progress with SubGroupMatrix prototype in Dawn
https://issues.chromium.org/issues/348702031, exposes SIMD-Group Matrix
Functions to webgpu. This shader implements a matmulnbits using that
primitive.
Observed perf gains, in terms of LLM inference speed, prefill perf for
Phi 3.5 for a 1K token prefill see 3x improvement. 5.4s from 15s.
With Changes
```
./model_benchmark -...
[Optimizer] Fix exception for Q -> DQ sequence with different scale types (#23771)### Description
Fixes bug in the IsQDQPairSupported utility function, which is used by
various QDQ optimizers (e.g., DoubleQDQPairsRemover, QDQFinalCleanup,
etc.). The bug causes an exception when IsQDQPairIsSupported() is called
with a `Q(scale_f32) -> DQ(scale_f16)` sequence that uses different
scale types.
### Motivation and Context
Fix bug that prevents creating QDQ models that use scale...
OVEP: Bug Fixes, Refactoring, and Contrib Ops Update (#23742)### Description
This pull request combines multiple improvements, bug fixes for the
OpenVINO Execution Provider (OVEP). The changes are summarized as
follows:
1. Support for various contrib Ops in OVEP.
2. Dimension Check Fixes for Greater, Pad, and MAX Ops: Fixed dimension
check failures for the Greater, Pad, and MAX ops in OVEP, ensuring they
now pass validation for all supported models.
3...
Shape inference: GatherBlockQuantized dispatcher (#23748)### Description
Add shape infer dispatcher for `GatherBlockQuantized` contrib op. It
reuses the dispatcher for `Gather` op since the first two inputs have
the same specs. The output elem type comes from input 2 (scales) for
`GatherBlockQuantized`.
### Motivation and Context
Support shape inference for models with `GatherBlockQuantized` op.
[QNN EP] Passthrough EP Parameters in Node (#23468)### Description
The existing implementation of session options for the QNN EP does not
honor the various bindings available. As such, even if set at runtime
they are ignored. Fix is to follow the pattern of the `webgpu` provider
and parse/populate the options accordingly.
Existing defaults are preserved, such that if options are not set the
prior behavior will persist.
### Motivation and Cont...
[JSEP] fix scatter-nd jsep kernel (#23755)Adjusts scatter-nd kernel implementation for the case when
reduction=none and there are duplicate values in the indices input
tensor. If duplicates are detected, a single thread processes all
indices to ensure correct results.
[onnxruntime/build] Add CI testing for ORT build with generic interface (#23530)[onnxruntime/build] Add CI testing for ORT build with generic interface
Summary:
- Remove unused cmake variables
- Add target specific logic when generic interface is used.
- Add QNN EP test case that use ORT generic interface build
Rope imbedding kernel to use avx2 (#23694)### Description
<!-- Describe your changes. -->
Credit to [chethanpk](https://github.com/chethanpk) who provided with
Rope Embedding in a patch. The patch is in the first commit of this PR.
I have been confirming perf improvement with this code change. My
analysis is based on phi-3-mini-4k-instruct-int4-int8-blklen32.
Benchmark from onnxruntim-genai does not show clear improvement. this is
bec...
Add a new build flag to build.py for using with vcpkg (#23723)1. **Add new flag to build.py**: Introduced a
`--use_vcpkg_ms_internal_asset_cache` flag to `build.py`. The flag is
intended for internal use only.
2. **Reduce excessive logs**: Removed some excessive logs from
`vcpkg_helper.py`.
Capacity aware partitioning (#22766)### Description
Allow users to specify per EP specific resource constraints.
Currently, models that do not fit into device memory error out.
This PR lays groundwork for EP specific resource constrained graph
partitioning, subject to incremental feature additions.
Partitioning in this context means to assign graph nodes to a specific
device (Execution Provider)
up to a certain limit that is ev...
[AIX]eigen update fix and test failures fix (#23751)### Description
Changes in this PR are for:
- Cleanup the patch for Eigen on AIX. Not needed anymore .
- Fix to recent test failures
```
1: [----------] Global test environment tear-down
1: [==========] 4737 tests from 310 test suites ran. (94682 ms total)
1: [ PASSED ] 4733 tests.
1: [ SKIPPED ] 2 tests, listed below:
1: [ SKIPPED ] MatMulFpQ4.MatMul2DSym
1: [ SKIPPED ] MatMulFpQ4.MatMu...
Add condition to gpu wheel build flag (#23760)### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix security vulnerability with Whisper export (#23743)### Description
This PR reverts changes from [this
PR](https://github.com/microsoft/onnxruntime/pull/15759/files).
### Motivation and Context
This fixes a security vulnerability that was raised internally.