-
Notifications
You must be signed in to change notification settings - Fork 87
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge Navi changes for int8 unit tests and verification mlir issues (#…
…2393) Ensure unique module name for MLIR standalone ops (#2360) Cherry Pick ASAN build excluding additional bin files -1839 #2370 Use an older version of numpy for compatibility with Python3.6 #2369 Add space after rate message Fix wrong size check when axes not present for slice (#2270) Updated Changelog to document latest release work (#2363) Run simplify_qdq first before optimize_module (#2387) fix unit tests and verification issues on navi Set numpy to 1.21.6 and disable py3.6 test Release notes & changelog updates (#2395)
- Loading branch information
Showing
7 changed files
with
270 additions
and
234 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,113 +1,141 @@ | ||
# Change Log for MIGraphX | ||
# Changelog for MIGraphX | ||
|
||
Full documentation for MIGraphX is available at [MIGraphX Documentation](https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/). | ||
Full documentation for MIGraphX is available at | ||
[https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/](https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/). | ||
|
||
## MIGraphX 2.8 for ROCm 6.0.0 | ||
### Added | ||
- Support for MI300 GPUs | ||
- Support for TorchMIGraphX via PyTorch | ||
- Boosted overall performance by integrating rocMLIR | ||
- INT8 support for ONNX Runtime | ||
- Support for ONNX version 1.14.1 | ||
- Added operators Qlinearadd, QlinearGlobalAveragePool, Qlinearconv, Shrink, CastLike, and RandomUniform operators | ||
- Added an error message when gpu_targets is not set when compiling migraphx | ||
- Added parameter to set tolerances with migraphx-driver verify | ||
- Added support for MXR files >4 GB | ||
- Added MIGRAPHX_TRACE_MLIR flag | ||
- BETA added capability to use ROCm Composable Kernels via environment variable MIGRAPHX_ENABLE_CK=1 | ||
|
||
### Additions | ||
|
||
* Support for MI300 GPUs | ||
* Support for TorchMIGraphX via PyTorch | ||
* Boosted overall performance by integrating rocMLIR | ||
* INT8 support for ONNX Runtime | ||
* Support for ONNX version 1.14.1 | ||
* Added new operators: `Qlinearadd`, `QlinearGlobalAveragePool`, `Qlinearconv`, `Shrink`, `CastLike`, | ||
and `RandomUniform` | ||
* Added an error message for when `gpu_targets` is not set during MIGraphX compilation | ||
* Added parameter to set tolerances with `migraphx-driver` verify | ||
* Added support for MXR files > 4 GB | ||
* Added `MIGRAPHX_TRACE_MLIR` flag | ||
* BETA added capability for using ROCm Composable Kernels via the `MIGRAPHX_ENABLE_CK=1` | ||
environment variable | ||
|
||
### Optimizations | ||
- Improved performance support for INT8 | ||
- Improved time percision while benchmarking candidate kernels from CK or MLIR | ||
- Remove contiguous from reshape parsing | ||
- Updated ConstantOfShape operator to support Dynamic Batch | ||
- Simplifies dynamic shapes related operators to their static versions if possible | ||
- Improved debugging tools for accuracy issues | ||
- Print warning about miopen_fusion while generating mxr | ||
- General reduction in system memory usage during model compilation | ||
- Created additional fusion opportunities during model compilation | ||
- Improved debugging for matchers | ||
- Improved general debug messages | ||
|
||
### Fixed | ||
- Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo | ||
- Provided a compile option to improve accuracy of some models by disabling Fast-Math | ||
- Improved layernorm + pointwise fusion matching to ignore arguments order | ||
- Fixed accuracy issue with ROIAlign operator | ||
- Fixed Trilu operator computation logic | ||
- Fixed support for the DETR model | ||
|
||
### Changed | ||
- Changed migraphx version to 2.8 | ||
- Extracted test packages as its own separate deb file when building migraphx from source | ||
|
||
### Removed | ||
- Removed building Python 2.7 bindings | ||
|
||
* Improved performance support for INT8 | ||
* Improved time precision while benchmarking candidate kernels from CK or MLIR | ||
* Removed contiguous from reshape parsing | ||
* Updated the `ConstantOfShape` operator to support Dynamic Batch | ||
* Simplified dynamic shapes-related operators to their static versions, where possible | ||
* Improved debugging tools for accuracy issues | ||
* Included a print warning about `miopen_fusion` while generating `mxr` | ||
* General reduction in system memory usage during model compilation | ||
* Created additional fusion opportunities during model compilation | ||
* Improved debugging for matchers | ||
* Improved general debug messages | ||
|
||
### Fixes | ||
|
||
* Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo | ||
* Provided a compile option to improve the accuracy of some models by disabling Fast-Math | ||
* Improved layernorm + pointwise fusion matching to ignore argument order | ||
* Fixed accuracy issue with `ROIAlign` operator | ||
* Fixed computation logic for the `Trilu` operator | ||
* Fixed support for the DETR model | ||
|
||
### Changes | ||
|
||
* Changed MIGraphX version to 2.8 | ||
* Extracted the test packages into a separate deb file when building MIGraphX from source | ||
|
||
### Removals | ||
|
||
* Removed building Python 2.7 bindings | ||
|
||
## MIGraphX 2.7 for ROCm 5.7.0 | ||
### Added | ||
- Enabled hipRTC to not require dev packages for migraphx runtime and allow the ROCm install to be in a different directory than it was during build time | ||
- Add support for multi-target execution | ||
- Added Dynamic Batch support with C++/Python APIs | ||
- Add migraphx.create_argument to python API | ||
- Added dockerfile example for Ubuntu 22.04 | ||
- Add TensorFlow supported ops in driver similar to exist onnx operator list | ||
- Add a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace | ||
- Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2 | ||
- use fast_math flag instead of ENV flag for GELU | ||
- Print message from driver if offload copy is set for compiled program | ||
|
||
### Additions | ||
|
||
* hipRTC no longer requires dev packages for MIGraphX runtime and allows the ROCm install to be in a | ||
different directory than build time | ||
* Added support for multi-target execution | ||
* Added Dynamic Batch support with C++/Python APIs | ||
* Added `migraphx.create_argument` to Python API | ||
* Added dockerfile example for Ubuntu 22.04 | ||
* Added TensorFlow supported ops in driver similar to exist onnx operator list | ||
* Added a MIGRAPHX_TRACE_MATCHES_FOR env variable to filter the matcher trace | ||
* Improved debugging by printing max,min,mean and stddev values for TRACE_EVAL = 2 | ||
* You can now use the ` fast_math` flag instead of `ENV` for GELU | ||
* Print message from driver if offload copy is set for compiled program | ||
|
||
### Optimizations | ||
- Optimized for ONNX Runtime 1.14.0 | ||
- Improved compile times by only building for the GPU on the system | ||
- Improve performance of pointwise/reduction kernels when using NHWC layouts | ||
- Load specific version of the migraphx_py library | ||
- Annotate functions with the block size so the compiler can do a better job of optimizing | ||
- Enable reshape on nonstandard shapes | ||
- Use half HIP APIs to compute max and min | ||
- Added support for broadcasted scalars to unsqueeze operator | ||
- Improved multiplies with dot operator | ||
- Handle broadcasts across dot and concat | ||
- Add verify namespace for better symbol resolution | ||
### Fixed | ||
- Resolved accuracy issues with FP16 resnet50 | ||
- Update cpp generator to handle inf from float | ||
- Fix assertion error during verify and make DCE work with tuples | ||
- Fix convert operation for NaNs | ||
- Fix shape typo in API test | ||
- Fix compile warnings for shadowing variable names | ||
- Add missing specialization for the `nullptr` for the hash function | ||
### Changed | ||
- Bumped version of half library to 5.6.0 | ||
- Bumped CI to support rocm 5.6 | ||
- Make building tests optional | ||
- replace np.bool with bool as per numpy request | ||
### Removed | ||
- Removed int8x4 rocBlas calls due to deprecation | ||
- removed std::reduce usage since not all OS' support it | ||
|
||
* Optimized for ONNX Runtime 1.14.0 | ||
* Improved compile times by only building for the GPU on the system | ||
* Improved performance of pointwise/reduction kernels when using NHWC layouts | ||
* Loaded specific version of the `migraphx_py` library | ||
* Annotated functions with the block size so the compiler can do a better job of optimizing | ||
* Enabled reshape on nonstandard shapes | ||
* Used half HIP APIs to compute max and min | ||
* Added support for broadcasted scalars to unsqueeze operator | ||
* Improved multiplies with dot operator | ||
* Handled broadcasts across dot and concat | ||
* Added verify namespace for better symbol resolution | ||
|
||
### Fixes | ||
|
||
* Resolved accuracy issues with FP16 resnet50 | ||
* Updated cpp generator to handle inf from float | ||
* Fixed assertion error during verify and made DCE work with tuples | ||
* Fixed convert operation for NaNs | ||
* Fixed shape typo in API test | ||
* Fixed compile warnings for shadowing variable names | ||
* Added missing specialization for the `nullptr` hash function | ||
|
||
### Changees | ||
|
||
* Bumped version of half library to 5.6.0 | ||
* Bumped CI to support ROCm 5.6 | ||
* Made building tests optional | ||
* Replaced `np.bool` with `bool` per NumPy request | ||
|
||
### Removals | ||
|
||
* Removed int8x4 rocBlas calls due to deprecation | ||
* Removed `std::reduce` usage because not all operating systems support it | ||
|
||
## MIGraphX 2.5 for ROCm 5.5.0 | ||
### Added | ||
- Y-Model feature to store tuning information with the optimized model | ||
- Added Python 3.10 bindings | ||
- Accuracy checker tool based on ONNX Runtime | ||
- ONNX Operators parse_split, and Trilu | ||
- Build support for ROCm MLIR | ||
- Added migraphx-driver flag to print optimizations in python (--python) | ||
- Added JIT implementation of the Gather and Pad operator which results in better handling of larger tensor sizes. | ||
|
||
### Additions | ||
|
||
* Y-Model feature will store tuning information with the optimized model | ||
* Added Python 3.10 bindings | ||
* Accuracy checker tool based on ONNX runtime | ||
* ONNX operators parse_split, and Trilu | ||
* Build support for ROCm MLIR | ||
* Added the `migraphx-driver` flag to print optimizations in Python (--python) | ||
* Added JIT implementation of the Gather and Pad operators, which results in better handling for | ||
larger tensor sizes | ||
|
||
### Optimizations | ||
- Improved performance of Transformer based models | ||
- Improved performance of the Pad, Concat, Gather, and Pointwise operators | ||
- Improved onnx/pb file loading speed | ||
- Added general optimize pass which runs several passes such as simplify_reshapes/algebra and DCE in loop. | ||
### Fixed | ||
- Improved parsing Tensorflow Protobuf files | ||
- Resolved various accuracy issues with some onnx models | ||
- Resolved a gcc-12 issue with mivisionx | ||
- Improved support for larger sized models and batches | ||
- Use --offload-arch instead of --cuda-gpu-arch for the HIP compiler | ||
- Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow. | ||
- Changes inside JIT to temporarily use cosine to compute sine function. | ||
### Changed | ||
- Changed version/location of 3rd party build dependencies to pick up fixes | ||
|
||
* Improved performance of Transformer-based models | ||
* Improved performance of the `Pad`, `Concat`, `Gather`, and `Pointwise` operators | ||
* Improved ONNX/pb file loading speed | ||
* Added a general optimize pass that runs several passes, such as `simplify_reshapes`, algebra, and DCE | ||
in a loop | ||
|
||
### Fixes | ||
|
||
* Improved parsing for TensorFlow Protobuf files | ||
* Resolved various accuracy issues with some ONNX models | ||
* Resolved a gcc-12 issue with MIVisionX | ||
* Improved support for larger sized models and batches | ||
* Use `--offload-arch` instead of `--cuda-gpu-arch` for the HIP compiler | ||
* Changes inside JIT to use float accumulator for large reduce ops of half type to avoid overflow | ||
* Changes inside JIT to temporarily use cosine to compute sine function | ||
|
||
### Changes | ||
|
||
* Changed version and location of third-party build dependencies in order to pick up fixes |
Oops, something went wrong.