Skip to content

v1.13

Latest
Compare
Choose a tag to compare
@ptrendx ptrendx released this 09 Dec 19:29
· 28 commits to main since this release

Release Notes – Release 1.13

Key Features and Enhancements

  • [C/PyTorch/Jax] Added support for THD layout for MQA/GQA.
  • [Jax] Expanded FFI (Foreign Function Interface) support to include quantization, transpose, layernorms, fused-attention, and CUDA graphs; fixed miscellaneous bugs in the existing FFI implementations.
  • [Jax] Added support for Ring attention for context parallelism.
  • [PyTorch] Expanded support for the Sequential/Operations Based API to include activations, communication overlap, normalizations, and other fusions.
  • [PyTorch] Made miscellaneous fixes to reduce CPU overhead during execution.
  • [PyTorch] Leveraged cuDNN 9.6+ to reduce memory usage for THD input format to attention.

Fixed Issues

  • [PyTorch] Fixed a crash that could occur when using FlashAttention with context parallelism.
  • [C/Jax] Adopted 64-bit offsets to fix overflow for large tensors in the cuDNN attention back end.
  • [C/Jax] Fixed build when using clang compiler to build JAX native extensions.
  • [PyTorch] Fixed a crash when importing transformer-engine in CPU-only systems.
  • [PyTorch] Fixed a crash when using context parallelism with RoPE.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

  • Transformer Engine support for the PaddlePaddle framework is deprecated, and will be fully removed in version 2.0.
  • Support for exporting Transformer Engine modules via ONNX is deprecated, and will be removed in version 2.0. This feature will be supported again in a later minor release of version 2.