Releases: ROCm/rocMLIR
Releases · ROCm/rocMLIR
rocm-6.4.3
What's Changed
- No changes since rocm-6.4.2
rocm-6.4.2
What's Changed
- [6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
- [6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
- [HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
- [BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
- [BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825
Full Changelog: rocm-6.4.0...rocm-6.4.2
rocm-6.4.1
What's Changed
- [6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
- [6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
- [HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
- [BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
- [BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825
Full Changelog: rocm-6.4.0...rocm-6.4.1
rocm-6.4.0
What's Changed
- Fix crash with invalid !migraphx.shaped types by @krzysz00 in #1667
- [CI] External CI mainline build support by @amd-jmacaran in #1670
- Don't construct Embed{}s for 1x1 filters in convolutions by @krzysz00 in #1669
- [CI] Update Dockerfile to use Ubuntu 22.04 by @stefankoncarevic in #1662
- Lower minNumCUs for gfx11 as gfx1103 has 12 CUs only by @umangyadav in #1673
- Fix alignment constraints not being correctly imposed in certain vect… by @krzysz00 in #1674
- [CI] Update Dockerfile to set ONNX version to 1.14.1 by @stefankoncarevic in #1676
- Fix not enabling fp8 WMMA on Navi4 by default by @krzysz00 in #1677
- Use
blockwise_broadcast_reduce
in reduction fusions. by @manupak in #1668 - Fix gated threadwise_write_all by @dhernandez0 in #1683
- Fix int4 loads to be vector typed always by @manupak in #1682
- Remove unnecessary pass from a test by @manupak in #1688
- navi4x tests fail with mixed types bf8_fp8 by @dhernandez0 in #1684
- [DO NOT SQUASH] Move to new-style atomic safety annotations by @krzysz00 in #1678
- Fix hardcoded arguments and results ids for prefill by @dhernandez0 in #1687
- find BlockArgument from gemm output going through all view-like operations by @dhernandez0 in #1690
- Collected small Jenkinsfile improvements by @pcf000 in #1686
- [CI] Add support for Navi4x architecture in nightly CI pipeline. by @stefankoncarevic in #1599
- Fix conv1d bug and improve MIGraphXToTosa test coverage by @dhernandez0 in #1693
- Support signed and unsigned integer types in migraphx dialect by @dhernandez0 in #1692
- Fix conversion of quantizelinear for unsigned types by @dhernandez0 in #1694
- Replace myself with Chris in CODEOWNERS by @jerryyin in #1698
- Fix rocmlir-gen attention i8 verification bug by @dhernandez0 in #1697
- Workaround for issue 1661 by @dhernandez0 in #1699
- [CI] Refactor MIGraphX model testing with Jenkins credential access. by @stefankoncarevic in #1671
- Remove Simon from CODEOWNERS by @dhernandez0 in #1702
- Add GQA and KV Cache by @dhernandez0 in #1696
- Prepare Dockerfiles for rocm-6.3 by @umangyadav in #1704
- [CI] Updated Jenkins files to use rocm-6.3 by @stefankoncarevic in #1705
- Move license file to top-level by @darren-amd in #1715
- GridwiseGemmParams: fix compile error with LLVM libc++ due to missing const by @LunNova in #1708
- Upstream merge Nov 24 by @djramic in #1703
- Add a script for generating the quick-tuning perfconfigs list by @djramic in #1689
- add fp8_fp8 in perfRunner by @umangyadav in #1707
- Removed dummy target from LinalgNamedStructuredOps yamlgen by @stefankoncarevic in #1717
- Set rock.prefill type to the blockargument type instead of gemm output type by @dhernandez0 in #1721
- [CI] Revert multi-step execution in Navi3x nightly E2E tests by @stefankoncarevic in #1719
- Split-k fusions by @dhernandez0 in #1718
- Fuse reduce sum with split-k by @dhernandez0 in #1720
- Fix usage of llvm::reverse() and remove warnings by @dhernandez0 in #1724
- Allow OCP FP8 emulation by @umangyadav in #1716
- Sort selected quick-tuning perfconfigs by problem coverage by @djramic in #1723
- Enable f16 sum reduction by @dhernandez0 in #1722
- [DRAFT]Add support for bf16 attention by @djramic in #1710
- Remove kpack from the decision of how many elements to copy per thread by @dhernandez0 in #1714
- Fix tuning for split-k fusions by @dhernandez0 in #1725
- Support for gfx950 arch by @dhernandez0 in #1731
- Upstream Merge [Jan] by @stefankoncarevic in #1728
- Explicitly convert to char by @Xeonacid in #1709
- Fix build failures by @umangyadav in #1735
- Enable e2e fusion bf16 tests on gfx11. by @stefankoncarevic in #1736
- KV-cache MIGraphX integration by @dhernandez0 in #1729
- Enable dense_output_bf16 test, adjust build functions for Navi3x by @stefankoncarevic in #1737
- [6.4][Backport] Backport some bugfixes by @dhernandez0 in #1754
- [6.4][BACKPORT] [TOSA] Set accType to Float16 for the Fp8 types by @umangyadav in #1751
- [6.4][BACKPORT] Use AddDim for unit input dimensions to help getMaxVectorization() by @umangyadav in #1756
- [6.4]fix compilation with HIP SDK 6.3 for Windows (#1742) by @causten in #1759
New Contributors
- @darren-amd made their first contribution in #1715
- @LunNova made their first contribution in #1708
- @Xeonacid made their first contribution in #1709
Full Changelog: rocm-6.3.3...rocm-6.4.0
rocm-6.3.3
What's Changed
- Workaround for issue 1661 by @dhernandez0 in #1701
Full Changelog: rocm-6.3.0...rocm-6.3.3
rocm-6.3.2
What's Changed
- Workaround for issue 1661 by @dhernandez0 in #1701
Full Changelog: rocm-6.3.0...rocm-6.3.2
rocm-6.3.1
What's Changed
- Workaround for issue 1661 by @dhernandez0 in #1701
Full Changelog: rocm-6.3.0...rocm-6.3.1
rocm-6.3.0
What's Changed
- Fix weekly Jenkins stalls by removing "any" machine selection by @jerryyin in #1541
- Use early-increment range for loop through uses that may erase an operation. by @pcf000 in #1546
- Fix cmake dependency for Transforms by @aquad in #1542
- Update Dockerfile to match llvm-premerge-checks. by @pcf000 in #1551
- Also link ctest when installing cmake, since MIGraphX testing uses it. by @pcf000 in #1553
- Remove -DROCMLIR_GEN_FLAGS from Jenkinsfile.downstream. by @pcf000 in #1554
- Support for parsing options from ROCMLIR_DEBUG_FLAGS in library calls like MIGraphX by @pcf000 in #1549
- Manually set overflow flags when expanding indexing maps by @krzysz00 in #1529
- Revert "Manually set overflow flags when expanding indexing maps (#1529)" by @krzysz00 in #1556
- Upstream Merge [Q2/24] by @stefankoncarevic in #1547
- Make tuning-driver get arch from func or mod by @manupak in #1518
- Temporarily call registerMLIRCLOptions() from mlirRegisterRocMLIRPasses(), until we can call it from MIGraphX. by @pcf000 in #1557
- [Attention] Enable blockwise transposes/rotations to avoid bank conflicts. by @manupak in #1526
- [DO NOT SQUASH] Remove mhal.prefill attr's tight coupling to
rock
dialect by @manupak in #1555 - Fix storeMethod in non-accel gemm for split-k by @manupak in #1561
- Rearrange alloc_tensor creation and CSE to fix input fusion crash by @krzysz00 in #1560
- Add mhal dependency
rock-to-gpu
by @manupak in #1559 - [CI] Add a option define the target branch by @manupak in #1565
- Remove gfx906 from Jenkinsfile. Most importantly, don't force gfx906 for vanilla codepath. by @pcf000 in #1564
- [rock] Switch to target attributes from serialization by @fabianmcg in #1548
- CHange definition of dequantizelinear to match MIGraphX, ONNX by @krzysz00 in #1567
- Prune attention tuning space for Navi3x by @manupak in #1570
- [CI] Make our PR CI use a fixed layout for conv by @manupak in #1571
- Make chiplet-aware grid layout by @manupak in #1501
- Gfx12 support in rocMLIR by @giuseros in #1562
- Bumping Navi3 CI execution to use 4 threads by @jerryyin in #1568
- Fix
mhal::PrefillAttr
data retrieval after switching to target attributes by @fabianmcg in #1582 - Add reduce as a fusor in regularization by @krzysz00 in #1581
- [DO NOT SQUASH] Subtree merge from llvm-project upstream, 2024-07-17 by @pcf000 in #1578
- Add reduce sum by @umangyadav in #1574
- Remove other traces of gfx906, missed last time. by @pcf000 in #1585
- Fix link by @umangyadav in #1588
- Change BufferDependencyAnalysis to use OpOperand* in its internals by @krzysz00 in #1586
- Revert removal of '== false' and the like, because null is neither true nor false in params. by @pcf000 in #1592
- [EXTERNAL] Add missing llvm builders for ROCDL intrinsics by @manupak in #1591
- Fix multi-output fusion bugs by reworking regularization by @krzysz00 in #1590
- [MLIR][AMDGPU] Add amdgpu.sched_barrier (#98911) by @manupak in #1595
- swizzle the MFMA outputs via LDS by @dhernandez0 in #1580
- Updated docker environment to ROCm 6.2 by @stefankoncarevic in #1597
- [CI] Updated Jenkinsfiles to use rocm-6.2 by @stefankoncarevic in #1598
- [DO NOT SQUASH!] Non-MIGraphX-related changes for int4 support by @krzysz00 in #1584
- Implement emulated FP8 for the OCP formats. by @pcf000 in #1566
- Update CAPI tests to include C++ tests by @fabianmcg in #1602
- Do not do TransposeRewritePattern if the operation has more than one use by @dhernandez0 in #1601
- Fix code-coverage version mismatch -- PATH in Jenkinsfile doesn't apply to sh by @pcf000 in #1606
- [DO NOT SQUASH][EXTERNAL] fix plumbing of rocdl attrs: waves_per_eu & unsafe_atomics by @manupak in #1609
- Move ops inside k-Loop into pipeline stages by @manupak in #1600
- use removeUpperDims if possible by @dhernandez0 in #1607
- Buildbot improvements and fixes. by @pcf000 in #1608
- Fix tests where no GPU is present by @krzysz00 in #1610
- Enable external CI pipeline triggers by @amd-jmacaran in #1552
- Move threadwise_copy ops in gridwise_gemm_accel, pipeline non-accel by @krzysz00 in #1603
- [DO NOT SQUASH] Upstream merge August'24 by @manupak in #1614
- [Attention] Fixed preSoftmaxElementwiseRegion input ordering by @manupak in #1615
- [DO NOT SQUASH] Fp8 support for gfx12 by @giuseros in #1612
- [DO NOT SQUASH][EXTERNAL] Add a scheduling barrier guard around inlineAsm lds.barrier by @manupak in #1619
- Fix crash arising from insufficient guards in WMMA instruction selector by @krzysz00 in #1621
- MIGraphX changes for int4 by @krzysz00 in #1596
- Collected Jenkinsfile tweaks for reliability. by @pcf000 in #1622
- Make LDS one big pool so we can allocate/deallocate/reuse it by @dhernandez0 in #1611
- Use separate call to check for
gfx11
by @umangyadav in #1625 - Add 3-D layouts to conv regression tests and fix the problems exposed by @pcf000 in #1623
- Handle process exception from calling rocminfo, to see why it sometimes fails. by @pcf000 in #1628
- Fix hard-coded '5' that needs to be inputDimension.size() to handle 3-D convolutions. by @pcf000 in #1627
- Fix too-strict test in fp8 emulation chenks by @krzysz00 in #1626
- Fix test for input vectorization traversal, use types correctly, add … by @krzysz00 in #1630
- Fix removeUpperDims by @manupak in #1631
- Report error when a library fails to load by @dhernandez0 in #1632
- Fix verifyGemmTypes for fp8 by @dhernandez0 in #1633
- Reduced split-k range by @djramic in #1616
- disable occupancy warnings by @umangyadav in #1635
- Fix overly-strict guards in LLVM conversions for fp8 intrinsic by @krzysz00 in #1637
- Fix issue 1620 by @icobg in #1640
- Handle the new F8 types in RockTuningImpl.cpp too. Oops. by @pcf000 in #1634
- Move enableApplicability to allow ReuseLDSPass to fail if there is not enough LDS memory by @dhernandez0 in #1643
- [CI] Created new CI job pipeline for Navi4x architecture by @stefankoncarevic in #1604
- Use cmake -E touch for cross-platform compatibility by @stefankoncarevic in #1644
- MLIR#1470: fix crash when blockPerCU == 0 by @krzysz00 in #1648
- [DO NOT SQUASH] Stop treating bf16 as i16 by @krzysz00 in #1646
- [DO NOT SQUASH] Use half-precision math library calls by @dhernandez0 in #1650
- Add
.amdhsa_code_object_version
metadata serializing rocDL modules by @umangyadav in #1645 - [DO NOT SQUASH] Fix vector conversion (ExtendToSupportedTypes) by @dhernandez0 in #1653
- Add support for
pad
+ removeSubDims inremoveUpperDims
by @manupak in #1639 - Exit gracefully if no support for wmma/mfma for attention tuning by @dhernandez0 in #1657
- [CI] Add hipBLAS and hipBLASLt Dependencies by @stefankoncarevic in #1658
- Fix MIGraphX CI docker image build issue by @djramic in #1656
- Handle non-zero-preserving input fusions, make read_into track validity by @krzysz00 in https://github.com/ROCm/rocMLI...
rocm-6.2.4
Release for rocm-6.2.4