Skip to content

Release v0.4.1

Choose a tag to compare

@github-actions github-actions released this 14 Oct 05:13
· 133 commits to main since this release
a88349f

What's Changed

  • fix: fix the failed sampling unittest on 5090 by @yzh119 in #1886
  • Updated to latest docker tag by @nvmbreughe in #1889
  • Fix: Prevent race condition in cubin loader when file is being consumed by @yzh119 in #1852
  • Improve graph caching of cudnn graph by @Anerudhan in #1887
  • misc: Various Updates to Attention Microbenchmark Suite by @bkryu in #1891
  • docs: Fix installation instructions for CUDA-specific package URLs by @yzh119 in #1893
  • docker image improvements by @nvmbreughe in #1890
  • tests: Add batch size 1 cases to test_trtllm_gen_attention.py that fail, marked xfail by @bkryu in #1897
  • Ensure docker installs the torch version we need by @nvmbreughe in #1901
  • bugfix: exclude tests/utils/test_load_cubin_compile_race_condition.py from pytest by @yzh119 in #1907
  • ci: use self-hosted runner for building docker containers by @yzh119 in #1908
  • feat: Add FP4 TRTLLM-Gen throughput MOE batched gemms by @jiahanc in #1882
  • Update Docker CI tags to 20251010-8d072e6 by @github-actions[bot] in #1915
  • ci/cd: consolidate release workflow by @yzh119 in #1910
  • bugfix: fix cli error when cuda toolkit is not installed by @yzh119 in #1905
  • feat: trtrllm-gen global scaled FP8 GEMMs by @hypdeb in #1829
  • feat:enable fp8 blockscale moe for fused cultass for sm90 by @djmmoss in #1819
  • use ffi::TensorView instead of ffi::Tensor by @cyx-6 in #1844
  • Minor updates to cubin_loader.py download_file to avoid race condition on temporary file by @nvjullin in #1918
  • chore: make cache directory flashinfer-version specific by @yzh119 in #1920
  • misc: checksum check when downloading artifacts by @jimmyzho in #1761
  • release: bump version v0.4.1 by @yzh119 in #1921

New Contributors

Full Changelog: v0.4.0...v0.4.1