Public
  1. Public

OpenBLAS

Public
AuthorCommitMessageCommit dateIssues
Martin KroekerGitHubMartin Kroeker
1d5ed5c46beMMerge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
Martin KroekerGitHubMartin Kroeker
7338a473a79MMerge pull request #5150 from Harishmcw/WoA-ExperimentsRedefined threading logic for GESV and GEMV on WoA
Martin KroekerGitHubMartin Kroeker
5f200dca549MMerge pull request #5166 from martin-frbg/issue5158Expose the option to build without LAPACKE to ccmake
Martin KroekerGitHubMartin Kroeker
8b98db13e34MMerge pull request #5167 from taoye9/fix_sbgemv_n_kernel_typofix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
Ye TaoYe Tao
6b8b35cdf2dfix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
Ye TaoYe Tao
38ee7c93011Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
Martin KroekerGitHubMartin Kroeker
217324d8801MMerge pull request #5162 from taoye9/add_sbgemv_testsadd beta and alpha testcase for sbgemv
Martin KroekerGitHubMartin Kroeker
e4630ed15a8MMerge pull request #5160 from taoye9/sbgemv_n_neonAdd SBGEMVN Kernel for ARM64
Martin KroekerGitHubMartin Kroeker
35914aa9a2aExpose the option to build without LAPACKE to ccmake
Martin KroekerGitHubMartin Kroeker
2b941c44b59MMerge branch 'develop' into sbgemv_n_neon
Martin KroekerGitHubMartin Kroeker
c797e27a1caMMerge pull request #5159 from annop-w/sbgemv_t_bfdotAdd sbgemv_t_bfdot kernel for ARM64
Ye TaoYe Tao
4346b915597add beta and alpha testcase for sbgemv
Ye TaoYe Tao
35bdbca1535Add sbgemv_n_neon kernel for arm64.
Annop WongwathanaratAnnop Wongwathanarat
edaf51dd99bAdd sbgemv_t_bfdot kernel for ARM64This improves performance for sbgemv_t by up to 100x on NEOVERSEV1. The geometric mean speedup is ~61x for M=N=[2,512].
Martin KroekerGitHubMartin Kroeker
ef9e3f71595MMerge pull request #5149 from martin-frbg/fixup5077-5088Make the Neoverse GEMM/GEMV throttling code conditional on SMP
Martin KroekerGitHubMartin Kroeker
09ba0994615make throttling code conditional on SMP
HarishmcwHarishmcw
030ae1fd97fRedefined threading logic for WoA
Martin KroekerGitHubMartin Kroeker
1533fe49befMMerge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
Martin KroekerGitHubMartin Kroeker
c03a81b9274MMerge pull request #5141 from michalowski-arm/fork-throttleAdd throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
Martin KroekerGitHubMartin Kroeker
643966d9c7eMMerge pull request #5146 from martin-frbg/issue5123Fix "dummy2" flag reading in PPC970 S/DSCAL
Martin KroekerGitHubMartin Kroeker
77fba0f400bFix "dummy2" flag handling
Ye TaoYe Tao
f0bea79a6e1dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
Martin KroekerGitHubMartin Kroeker
20d11188652MMerge pull request #5143 from martin-frbg/issue5111Fix GEMMT transforming the input array B in some complex cases
Martin KroekerGitHubMartin Kroeker
75b958a0184Transform the B array back if necessary before returning
Marek MichalowskiMarek Michalowski
650a062e19eAdd thread throttling profile for SGEMV on `NEOVERSEV2`
Marek MichalowskiMarek Michalowski
b723c1b7b79Add thread throttling profile for SGEMM on `NEOVERSEV2`
Martin KroekerGitHubMartin Kroeker
ceb8f1e34b8MMerge pull request #5140 from martin-frbg/issue5139Add ARM64 options for NVIDIA HPC
Martin KroekerGitHubMartin Kroeker
f1fa370579afix missing endif
Martin KroekerGitHubMartin Kroeker
6d1444be3abAdd ARM64 options for NVIDIA HPC
Martin KroekerGitHubMartin Kroeker
eb84aac7ad9MMerge pull request #5084 from quic/topic/sgemm_direct_sme1Support for SGEMM_DIRECT Kernel based on SME1
Martin KroekerGitHubMartin Kroeker
abbd78aa592MMerge pull request #5138 from martin-frbg/issue5131Ensure that gmake builds with flang-new link the flang runtime into the shared library
Martin KroekerGitHubMartin Kroeker
ebcab909767Handle flang-new runtime library linking on Linux like classic-flang
Martin KroekerGitHubMartin Kroeker
ed1584666c2MMerge pull request #5137 from martin-frbg/issue5136Fix the CMake build to define USE_TRMM for RISCV64 targets as well
Martin KroekerGitHubMartin Kroeker
b9ae246f205define USE_TRMM for RISCV64 targets as well
Martin KroekerGitHubMartin Kroeker
86cf9d8a2edMMerge pull request #5133 from OpenMathLib/revert-4920-issue4917Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
Martin KroekerGitHubMartin Kroeker
0b3c56968d1MMerge pull request #5135 from martin-frbg/ghwf-n2CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
Martin KroekerGitHubMartin Kroeker
c1bb90a823eremove the express NeoverseN2 target from the Cobalt100 job
Martin KroekerGitHubMartin Kroeker
77c638db67dRevert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
Vaisakh K VGitHubVaisakh K V
f66ca05b313MMerge branch 'develop' into topic/sgemm_direct_sme1
Vaisakh K VVaisakh K V
d23eb3b93ecSupport for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API* Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1
Martin KroekerGitHubMartin Kroeker
a64b75a2e00MMerge pull request #5127 from Harishmcw/gesv-thresholdRefined GESV Parallelization Logic for Windows on ARM64
Martin KroekerGitHubMartin Kroeker
453efbd103fMMerge pull request #5128 from martin-frbg/issue5120Add -O2 to flang flags when building on WoA in Release mode
Martin KroekerGitHubMartin Kroeker
877d5a5be62Add -O2 to flang flags when building on WoA in Release mode
Martin KroekerGitHubMartin Kroeker
8d487ef6ebfMMerge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixedLoongArch64: Fixed lapack test for LA264
Harish-GitsHarish-Gits
daf16b8229bAdjusted GESV threading logic for optimal performance on WoA
Martin KroekerGitHubMartin Kroeker
e8b11a126bbMMerge pull request #5125 from martin-frbg/issue5122Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
Martin KroekerGitHubMartin Kroeker
9a3948df82eMMerge pull request #5126 from martin-frbg/cirrusbsd4CirrusCI: Update FreeBSD jobs to 14.2
Martin KroekerGitHubMartin Kroeker
7f1f776f583Update FreeBSD jobs to 14.2
Martin KroekerGitHubMartin Kroeker
81eed868b68Restore the non-vectorized code from before PR4880 for POWER8
Martin KroekerGitHubMartin Kroeker
98b5ef929cfRestore the non-vectorized code from before PR4880 for POWER8