Skip to content

Releases: amd/libflame

AOCL-LAPACK 5.0

11 Oct 05:25
Compare
Choose a tag to compare

Highlights of AOCL-LAPACK 5.0 release

  • Improved performance of the following APIs through AVX2 and AVX512 SIMD instructions:
    • Double Precision SVD (DGESVD)
    • LU Factorization/Solver routines for general matrices (DGETRF, ZGETRF, DGETRS, and DGESV)
    • Matrix inverse routine DGETRI for small sizes
    • Least Square solver DGELS for small sizes
    • Double Precision Auxiliary routine and DLARFG
  • Improved performance of the following APIs using local AOCL-BLAS optimized kernels:
    • LU Factorization/Solver routines for band storage matrices (DGBTRF and DGBTRS)
  • Option to set specific ISA code path at runtime through the AOCL_ENABLE_INSTRUCTIONS environment variable
  • Sphinx-based AOCL-LAPACK API documentation
  • pkgconfig support on Linux with CMake builds
  • LAPACK API modifications:
    • Updated AOCL-LAPACK APIs return type to match with corresponding netlib subroutine prototypes
    • Removed xerbla and lsame definition from AOCL-LAPACK. Applications must invoke lsame from the BLAS library
  • Test suite framework enhancements:
    • Improved accuracy tests including testing with different input generation mechanisms
    • Addition of extreme values, negative, and corner test cases
    • Addition of cases to test numerical stability
    • Support for LAPACKE interface test

AOCL-LAPACK 4.2

28 Feb 06:04
Compare
Choose a tag to compare

AOCL-LAPACK 4.2

Highlights of improvements on AMD “Zen” core based processors.

  • Improved performance of the following APIs:
    • Double Precision SVD (DGESVD)
    • Factorization routines DGETRF and ZGETRF
    • Solver routines DGETRS and DGESV
  • Option to link with AOCL-BLAS during build to enable invoking AOCL-BLAS internal APIs
  • Applications using AOCL-LAPACK need to additionally link with AOCL-Utils library
  • CMake improvements:
    • Ease of running tests using Ctest
    • ISA specific flags configurable during build
    • Code Coverage build option
  • OpenMP parallelism enabled in the following APIs to match LAPACK 3.11 implementation:
    • {c,z}hetrd_hb2st, {s,d}sytrd_sb2st, iparam2stage
  • Minor bug fixes and warnings addressed
  • Test suite framework improvements and more test cases

AOCL-LAPACK 4.1

05 Aug 06:50
Compare
Choose a tag to compare

AOCL-LAPACK 4.1

Highlights of improvements on AMD “Zen” core based processors.

  • Upgraded to Netlib LAPACK 3.11.0 specification that includes 14 new APIs, bug fixes, and improvements to the existing APIs
  • Improved performance of the following APIs:
    - Double Precision SVD (DGESVD) for small matrices
    - Factorization routines (DGEQP3, SGETRF, ZGETRF, and DGETRF)
    - Double Precision solver API DGESV and DGESVD routines
  • Framework changes to support the selection of optimized routines based on target CPU instruction set
  • CMake build system for Linux and Windows
  • Build system cleanup and minor bug fixes

AOCL-libFLAME 4.0

11 Nov 04:37
Compare
Choose a tag to compare

AOCL-libFLAME Version 4.0

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Upgrade to LAPACK 3.10.1 specification that includes several bug fixes from Netlib LAPACK
  • Improved performance of the following APIs:
    • Eigen Value routine (ZGGEV)
    • SVD routines (DGESDD, CGESDD, and ZGESDD)
  • Logging feature supports timing for real double precision libFLAME APIs
  • AOCL-Progress feature that provides progress update on API computations running for a long time is extended for more APIs: {S/C/Z}GETRF, {S/D}POTRF,{S/D}GEQRF, {S/C/D/Z}GBTRF

AOCL-libFLAME 3.2

08 Jul 13:58
Compare
Choose a tag to compare

AOCL-libFLAME Version 3.2

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of the following for AMD “Zen” architecture
    • Eigen Value routines (DSYEVD and DSTEQR)
    • SVD routines (DGESVD)
  • Feature AOCL_FLA_PROGRESS that provides progress update on API computations running for a long time; this support is available for double precision LU Factorization
  • Improvements in libFLAME build system with new config flag for enabling optimization specific to AMD CPUs
  • Improved test coverage

AMD Optimized libFLAME Version 3.1

23 Jun 09:46
Compare
Choose a tag to compare

AMD Optimized libFLAME Version 3.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • New APIs to compute partial LDLT factorization of a symmetric matrix using packed storage: ?spffrt2 and spffrtx
  • New APIs to perform complete or incomplete LU factorization without pivoting of a general matrix: ?getrfnp and ?getrfnpi
  • Test suite now supports LAPACK API tests for LU, Cholesky and QR operations
  • Several bug fixes including handling denormal numbers in SVD functions
  • New API to get version number of the library, FLA_Get_AOCL_Version()
  • Library function tracing and input logging support added

AMD Optimized libFLAME Version 3.0.1

06 Jul 03:39
Compare
Choose a tag to compare

AMD Optimized libFLAME Version 3.0.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of LU, QR and Cholesky Factorization
  • Improved performance of routine that computes partial LDLT factorization of a symmetric matrix using packed storage: spffrt2 and spffrtx
  • Improved performance of routine that computes complete/incomplete LU factorization without pivoting of a general matrix: getrfnp and getrfnpi
  • Library function tracing and input logging support added

AMD Optimized libFLAME Version 3.0

15 Mar 15:40
Compare
Choose a tag to compare

AMD Optimized libFLAME Version 3.0

Highlights of improvements on AMD EPYCTM processor family CPUs

  • New APIs to compute partial LDLT factorization of a symmetric matrix using packed storage: ?spffrt2 and ?spffrtx
  • New APIs to perform complete or incomplete LU factorization without pivoting of a general matrix: ?getrfnp and ?getrfnpi
  • Test suite now supports LAPACK API tests for LU, Cholesky and QR operations
  • Several bug fixes including handling denormal numbers in SVD functions
  • New API to get version number of the library, FLA_Get_AOCL_Version()
  • Library function tracing and input logging support added

AMD Optimized libFLAME Version 2.2

30 Jun 07:43
Compare
Choose a tag to compare

AMD Optimized libFLAME Version 2.2

Highlights of improvements on AMD EPYCTM processor family CPUs

  • libFLAME is now compatible with LAPACK 3.9.0 specification.
  • More coverage of Netlib tests suite compatibility

AMD Optimized libFLAME Version 2.1

14 Jan 05:03
Compare
Choose a tag to compare

AMD optimized libFLAME version 2.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Support for C++ Template APIs for all LAPACK functions
  • Includes LAPACKE source in the libFLAME source directory