Commits


Yi Zhang authored and GitHub committed 54871a27736
Replace T4 to A10 in Linux GPU workflow (#19205) ### Description 1. Update Linux GPU machine from T4 to A10, sm=8.6 2. update the tolerance ### Motivation and Context 1. Free more T4 and test with higher compute capability. 2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss and fail this test ``` 2024-01-19T13:27:18.8302842Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:25.8438641Z Expected equality of these values: 2024-01-19T13:27:25.8438841Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:25.8439276Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:25.8439464Z ret.first 2024-01-19T13:27:25.8445514Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ 2024-01-19T13:27:25.8446198Z 2024-01-19T13:27:25.8555736Z [ FAILED ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms) 2024-01-19T13:27:25.8556077Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312 2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure 2024-01-19T13:27:29.3175144Z Expected equality of these values: 2024-01-19T13:27:29.3175389Z COMPARE_RESULT::SUCCESS 2024-01-19T13:27:29.3175812Z Which is: 4-byte object <00-00 00-00> 2024-01-19T13:27:29.3176080Z ret.first 2024-01-19T13:27:29.3176322Z Which is: 4-byte object <01-00 00-00> 2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ ``` 3. some other test like SSD throw other exception, so skip them ''' 2024-01-22T09:07:40.8446910Z [ RUN ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12 2024-01-22T09:07:51.5587571Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358: Failure 2024-01-22T09:07:51.5588512Z Expected equality of these values: 2024-01-22T09:07:51.5588870Z COMPARE_RESULT::SUCCESS 2024-01-22T09:07:51.5589467Z Which is: 4-byte object <00-00 00-00> 2024-01-22T09:07:51.5589953Z ret.first 2024-01-22T09:07:51.5590462Z Which is: 4-byte object <01-00 00-00> 2024-01-22T09:07:51.5590841Z expected 1, got 63 '''