Releases · amd/libflame

11 Oct 05:25

pradeeptrgit

5.0

3279fc5

AOCL-LAPACK 5.0 Latest

Latest

Highlights of AOCL-LAPACK 5.0 release

Improved performance of the following APIs through AVX2 and AVX512 SIMD instructions:
- Double Precision SVD (DGESVD)
- LU Factorization/Solver routines for general matrices (DGETRF, ZGETRF, DGETRS, and DGESV)
- Matrix inverse routine DGETRI for small sizes
- Least Square solver DGELS for small sizes
- Double Precision Auxiliary routine and DLARFG
Improved performance of the following APIs using local AOCL-BLAS optimized kernels:
- LU Factorization/Solver routines for band storage matrices (DGBTRF and DGBTRS)
Option to set specific ISA code path at runtime through the AOCL_ENABLE_INSTRUCTIONS environment variable
Sphinx-based AOCL-LAPACK API documentation
pkgconfig support on Linux with CMake builds
LAPACK API modifications:
- Updated AOCL-LAPACK APIs return type to match with corresponding netlib subroutine prototypes
- Removed xerbla and lsame definition from AOCL-LAPACK. Applications must invoke lsame from the BLAS library
Test suite framework enhancements:
- Improved accuracy tests including testing with different input generation mechanisms
- Addition of extreme values, negative, and corner test cases
- Addition of cases to test numerical stability
- Support for LAPACKE interface test

Assets 2

28 Feb 06:04

pradeeptrgit

4.2

b51ceb2

AOCL-LAPACK 4.2

Highlights of improvements on AMD “Zen” core based processors.

Improved performance of the following APIs:
- Double Precision SVD (DGESVD)
- Factorization routines DGETRF and ZGETRF
- Solver routines DGETRS and DGESV
Option to link with AOCL-BLAS during build to enable invoking AOCL-BLAS internal APIs
Applications using AOCL-LAPACK need to additionally link with AOCL-Utils library
CMake improvements:
- Ease of running tests using Ctest
- ISA specific flags configurable during build
- Code Coverage build option
OpenMP parallelism enabled in the following APIs to match LAPACK 3.11 implementation:
- {c,z}hetrd_hb2st, {s,d}sytrd_sb2st, iparam2stage
Minor bug fixes and warnings addressed
Test suite framework improvements and more test cases

Assets 2

05 Aug 06:50

pradeeptrgit

4.1

741fb69

AOCL-LAPACK 4.1

Highlights of improvements on AMD “Zen” core based processors.

Upgraded to Netlib LAPACK 3.11.0 specification that includes 14 new APIs, bug fixes, and improvements to the existing APIs
Improved performance of the following APIs:
- Double Precision SVD (DGESVD) for small matrices
- Factorization routines (DGEQP3, SGETRF, ZGETRF, and DGETRF)
- Double Precision solver API DGESV and DGESVD routines
Framework changes to support the selection of optimized routines based on target CPU instruction set
CMake build system for Linux and Windows
Build system cleanup and minor bug fixes

Assets 2

11 Nov 04:37

pradeeptrgit

4.0

a3b84dc

AOCL-libFLAME 4.0

AOCL-libFLAME Version 4.0

Highlights of improvements on AMD EPYCTM processor family CPUs

Upgrade to LAPACK 3.10.1 specification that includes several bug fixes from Netlib LAPACK
Improved performance of the following APIs:
- Eigen Value routine (ZGGEV)
- SVD routines (DGESDD, CGESDD, and ZGESDD)
Logging feature supports timing for real double precision libFLAME APIs
AOCL-Progress feature that provides progress update on API computations running for a long time is extended for more APIs: {S/C/Z}GETRF, {S/D}POTRF,{S/D}GEQRF, {S/C/D/Z}GBTRF

Assets 2

08 Jul 13:58

pradeeptrgit

3.2

fdb2fac

AOCL-libFLAME 3.2

AOCL-libFLAME Version 3.2

Highlights of improvements on AMD EPYC^TM processor family CPUs

Improved performance of the following for AMD “Zen” architecture
- Eigen Value routines (DSYEVD and DSTEQR)
- SVD routines (DGESVD)
Feature AOCL_FLA_PROGRESS that provides progress update on API computations running for a long time; this support is available for double precision LU Factorization
Improvements in libFLAME build system with new config flag for enabling optimization specific to AMD CPUs
Improved test coverage

Assets 2

23 Jun 09:46

pradeeptrgit

3.1

8f281f1

AMD Optimized libFLAME Version 3.1

Highlights of improvements on AMD EPYC^TM processor family CPUs

New APIs to compute partial LDLT factorization of a symmetric matrix using packed storage: ?spffrt2 and spffrtx
New APIs to perform complete or incomplete LU factorization without pivoting of a general matrix: ?getrfnp and ?getrfnpi
Test suite now supports LAPACK API tests for LU, Cholesky and QR operations
Several bug fixes including handling denormal numbers in SVD functions
New API to get version number of the library, FLA_Get_AOCL_Version()
Library function tracing and input logging support added

Assets 2

06 Jul 03:39

pradeeptrgit

3.0.1

593e464

AMD Optimized libFLAME Version 3.0.1

Highlights of improvements on AMD EPYCTM processor family CPUs

Improved performance of LU, QR and Cholesky Factorization
Improved performance of routine that computes partial LDL^T factorization of a symmetric matrix using packed storage: spffrt2 and spffrtx
Improved performance of routine that computes complete/incomplete LU factorization without pivoting of a general matrix: getrfnp and getrfnpi
Library function tracing and input logging support added

Assets 2

15 Mar 15:40

pradeeptrgit

3.0

ec457fb

AMD Optimized libFLAME Version 3.0

Highlights of improvements on AMD EPYC^TM processor family CPUs

New APIs to compute partial LDL^T factorization of a symmetric matrix using packed storage: ?spffrt2 and ?spffrtx
New APIs to perform complete or incomplete LU factorization without pivoting of a general matrix: ?getrfnp and ?getrfnpi
Test suite now supports LAPACK API tests for LU, Cholesky and QR operations
Several bug fixes including handling denormal numbers in SVD functions
New API to get version number of the library, FLA_Get_AOCL_Version()
Library function tracing and input logging support added