Martin Kroeker 1d5ed5c46be M Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2 Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2 04 Mar 2025 Martin Kroeker 7338a473a79 M Merge pull request #5150 from Harishmcw/WoA-Experiments Redefined threading logic for GESV and GEMV on WoA 04 Mar 2025 Martin Kroeker 5f200dca549 M Merge pull request #5166 from martin-frbg/issue5158 Expose the option to build without LAPACKE to ccmake 03 Mar 2025 Martin Kroeker 8b98db13e34 M Merge pull request #5167 from taoye9/fix_sbgemv_n_kernel_typo fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c 03 Mar 2025 Ye Tao 6b8b35cdf2d fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c 03 Mar 2025 Ye Tao 38ee7c93011 Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2 03 Mar 2025 Martin Kroeker 217324d8801 M Merge pull request #5162 from taoye9/add_sbgemv_tests add beta and alpha testcase for sbgemv 03 Mar 2025 Martin Kroeker e4630ed15a8 M Merge pull request #5160 from taoye9/sbgemv_n_neon Add SBGEMVN Kernel for ARM64 03 Mar 2025 Martin Kroeker 35914aa9a2a Expose the option to build without LAPACKE to ccmake 03 Mar 2025 Martin Kroeker 2b941c44b59 M Merge branch 'develop' into sbgemv_n_neon 03 Mar 2025 Martin Kroeker c797e27a1ca M Merge pull request #5159 from annop-w/sbgemv_t_bfdot Add sbgemv_t_bfdot kernel for ARM64 03 Mar 2025 Ye Tao 4346b915597 add beta and alpha testcase for sbgemv 28 Feb 2025 Ye Tao 35bdbca1535 Add sbgemv_n_neon kernel for arm64. 28 Feb 2025 Annop Wongwathanarat edaf51dd99b Add sbgemv_t_bfdot kernel for ARM64 This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512]. 26 Feb 2025 Martin Kroeker ef9e3f71595 M Merge pull request #5149 from martin-frbg/fixup5077-5088 Make the Neoverse GEMM/GEMV throttling code conditional on SMP 25 Feb 2025 Martin Kroeker 09ba0994615 make throttling code conditional on SMP 25 Feb 2025 Harishmcw 030ae1fd97f Redefined threading logic for WoA 25 Feb 2025 Martin Kroeker 1533fe49bef M Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2 dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 24 Feb 2025 Martin Kroeker c03a81b9274 M Merge pull request #5141 from michalowski-arm/fork-throttle Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2` 23 Feb 2025 Martin Kroeker 643966d9c7e M Merge pull request #5146 from martin-frbg/issue5123 Fix "dummy2" flag reading in PPC970 S/DSCAL 23 Feb 2025 Martin Kroeker 77fba0f400b Fix "dummy2" flag handling 23 Feb 2025 Ye Tao f0bea79a6e1 dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 21 Feb 2025 Martin Kroeker 20d11188652 M Merge pull request #5143 from martin-frbg/issue5111 Fix GEMMT transforming the input array B in some complex cases 21 Feb 2025 Martin Kroeker 75b958a0184 Transform the B array back if necessary before returning 21 Feb 2025 Marek Michalowski 650a062e19e Add thread throttling profile for SGEMV on `NEOVERSEV2` 20 Feb 2025 Marek Michalowski b723c1b7b79 Add thread throttling profile for SGEMM on `NEOVERSEV2` 20 Feb 2025 Martin Kroeker ceb8f1e34b8 M Merge pull request #5140 from martin-frbg/issue5139 Add ARM64 options for NVIDIA HPC 20 Feb 2025 Martin Kroeker f1fa370579a fix missing endif 19 Feb 2025 Martin Kroeker 6d1444be3ab Add ARM64 options for NVIDIA HPC 19 Feb 2025 Martin Kroeker eb84aac7ad9 M Merge pull request #5084 from quic/topic/sgemm_direct_sme1 Support for SGEMM_DIRECT Kernel based on SME1 19 Feb 2025 Martin Kroeker abbd78aa592 M Merge pull request #5138 from martin-frbg/issue5131 Ensure that gmake builds with flang-new link the flang runtime into the shared library 18 Feb 2025 Martin Kroeker ebcab909767 Handle flang-new runtime library linking on Linux like classic-flang 18 Feb 2025 Martin Kroeker ed1584666c2 M Merge pull request #5137 from martin-frbg/issue5136 Fix the CMake build to define USE_TRMM for RISCV64 targets as well 17 Feb 2025 Martin Kroeker b9ae246f205 define USE_TRMM for RISCV64 targets as well 17 Feb 2025 Martin Kroeker 86cf9d8a2ed M Merge pull request #5133 from OpenMathLib/revert-4920-issue4917 Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 17 Feb 2025 Martin Kroeker 0b3c56968d1 M Merge pull request #5135 from martin-frbg/ghwf-n2 CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow 17 Feb 2025 Martin Kroeker c1bb90a823e remove the express NeoverseN2 target from the Cobalt100 job 16 Feb 2025 Martin Kroeker 77c638db67d Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 16 Feb 2025 Vaisakh K V f66ca05b313 M Merge branch 'develop' into topic/sgemm_direct_sme1 13 Feb 2025 Vaisakh K V d23eb3b93ec Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1 05 Dec 2024 Martin Kroeker a64b75a2e00 M Merge pull request #5127 from Harishmcw/gesv-threshold Refined GESV Parallelization Logic for Windows on ARM64 13 Feb 2025 Martin Kroeker 453efbd103f M Merge pull request #5128 from martin-frbg/issue5120 Add -O2 to flang flags when building on WoA in Release mode 13 Feb 2025 Martin Kroeker 877d5a5be62 Add -O2 to flang flags when building on WoA in Release mode 13 Feb 2025 Martin Kroeker 8d487ef6ebf M Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed LoongArch64: Fixed lapack test for LA264 12 Feb 2025 Harish-Gits daf16b8229b Adjusted GESV threading logic for optimal performance on WoA 12 Feb 2025 Martin Kroeker e8b11a126bb M Merge pull request #5125 from martin-frbg/issue5122 Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code 12 Feb 2025 Martin Kroeker 9a3948df82e M Merge pull request #5126 from martin-frbg/cirrusbsd4 CirrusCI: Update FreeBSD jobs to 14.2 12 Feb 2025 Martin Kroeker 7f1f776f583 Update FreeBSD jobs to 14.2 12 Feb 2025 Martin Kroeker 81eed868b68 Restore the non-vectorized code from before PR4880 for POWER8 12 Feb 2025 Martin Kroeker 98b5ef929cf Restore the non-vectorized code from before PR4880 for POWER8 12 Feb 2025