Releases: amd/libflame
AOCL-LAPACK 5.0
Highlights of AOCL-LAPACK 5.0 release
- Improved performance of the following APIs through AVX2 and AVX512 SIMD instructions:
- Double Precision SVD (DGESVD)
- LU Factorization/Solver routines for general matrices (DGETRF, ZGETRF, DGETRS, and DGESV)
- Matrix inverse routine DGETRI for small sizes
- Least Square solver DGELS for small sizes
- Double Precision Auxiliary routine and DLARFG
- Improved performance of the following APIs using local AOCL-BLAS optimized kernels:
- LU Factorization/Solver routines for band storage matrices (DGBTRF and DGBTRS)
- Option to set specific ISA code path at runtime through the AOCL_ENABLE_INSTRUCTIONS environment variable
- Sphinx-based AOCL-LAPACK API documentation
- pkgconfig support on Linux with CMake builds
- LAPACK API modifications:
- Updated AOCL-LAPACK APIs return type to match with corresponding netlib subroutine prototypes
- Removed xerbla and lsame definition from AOCL-LAPACK. Applications must invoke lsame from the BLAS library
- Test suite framework enhancements:
- Improved accuracy tests including testing with different input generation mechanisms
- Addition of extreme values, negative, and corner test cases
- Addition of cases to test numerical stability
- Support for LAPACKE interface test
AOCL-LAPACK 4.2
AOCL-LAPACK 4.2
Highlights of improvements on AMD “Zen” core based processors.
- Improved performance of the following APIs:
- Double Precision SVD (DGESVD)
- Factorization routines DGETRF and ZGETRF
- Solver routines DGETRS and DGESV
- Option to link with AOCL-BLAS during build to enable invoking AOCL-BLAS internal APIs
- Applications using AOCL-LAPACK need to additionally link with AOCL-Utils library
- CMake improvements:
- Ease of running tests using Ctest
- ISA specific flags configurable during build
- Code Coverage build option
- OpenMP parallelism enabled in the following APIs to match LAPACK 3.11 implementation:
- {c,z}hetrd_hb2st, {s,d}sytrd_sb2st, iparam2stage
- Minor bug fixes and warnings addressed
- Test suite framework improvements and more test cases
AOCL-LAPACK 4.1
AOCL-LAPACK 4.1
Highlights of improvements on AMD “Zen” core based processors.
- Upgraded to Netlib LAPACK 3.11.0 specification that includes 14 new APIs, bug fixes, and improvements to the existing APIs
- Improved performance of the following APIs:
- Double Precision SVD (DGESVD) for small matrices
- Factorization routines (DGEQP3, SGETRF, ZGETRF, and DGETRF)
- Double Precision solver API DGESV and DGESVD routines - Framework changes to support the selection of optimized routines based on target CPU instruction set
- CMake build system for Linux and Windows
- Build system cleanup and minor bug fixes
AOCL-libFLAME 4.0
AOCL-libFLAME Version 4.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- Upgrade to LAPACK 3.10.1 specification that includes several bug fixes from Netlib LAPACK
- Improved performance of the following APIs:
- Eigen Value routine (ZGGEV)
- SVD routines (DGESDD, CGESDD, and ZGESDD)
- Logging feature supports timing for real double precision libFLAME APIs
- AOCL-Progress feature that provides progress update on API computations running for a long time is extended for more APIs: {S/C/Z}GETRF, {S/D}POTRF,{S/D}GEQRF, {S/C/D/Z}GBTRF
AOCL-libFLAME 3.2
AOCL-libFLAME Version 3.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- Improved performance of the following for AMD “Zen” architecture
- Eigen Value routines (DSYEVD and DSTEQR)
- SVD routines (DGESVD)
- Feature AOCL_FLA_PROGRESS that provides progress update on API computations running for a long time; this support is available for double precision LU Factorization
- Improvements in libFLAME build system with new config flag for enabling optimization specific to AMD CPUs
- Improved test coverage
AMD Optimized libFLAME Version 3.1
AMD Optimized libFLAME Version 3.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- New APIs to compute partial LDLT factorization of a symmetric matrix using packed storage: ?spffrt2 and spffrtx
- New APIs to perform complete or incomplete LU factorization without pivoting of a general matrix: ?getrfnp and ?getrfnpi
- Test suite now supports LAPACK API tests for LU, Cholesky and QR operations
- Several bug fixes including handling denormal numbers in SVD functions
- New API to get version number of the library, FLA_Get_AOCL_Version()
- Library function tracing and input logging support added
AMD Optimized libFLAME Version 3.0.1
AMD Optimized libFLAME Version 3.0.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- Improved performance of LU, QR and Cholesky Factorization
- Improved performance of routine that computes partial LDLT factorization of a symmetric matrix using packed storage: spffrt2 and spffrtx
- Improved performance of routine that computes complete/incomplete LU factorization without pivoting of a general matrix: getrfnp and getrfnpi
- Library function tracing and input logging support added
AMD Optimized libFLAME Version 3.0
AMD Optimized libFLAME Version 3.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- New APIs to compute partial LDLT factorization of a symmetric matrix using packed storage: ?spffrt2 and ?spffrtx
- New APIs to perform complete or incomplete LU factorization without pivoting of a general matrix: ?getrfnp and ?getrfnpi
- Test suite now supports LAPACK API tests for LU, Cholesky and QR operations
- Several bug fixes including handling denormal numbers in SVD functions
- New API to get version number of the library, FLA_Get_AOCL_Version()
- Library function tracing and input logging support added
AMD Optimized libFLAME Version 2.2
AMD Optimized libFLAME Version 2.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- libFLAME is now compatible with LAPACK 3.9.0 specification.
- More coverage of Netlib tests suite compatibility
AMD Optimized libFLAME Version 2.1
AMD optimized libFLAME version 2.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- Support for C++ Template APIs for all LAPACK functions
- Includes LAPACKE source in the libFLAME source directory