Release Release v3.2.0 · iree-org/iree

Notable changes

The TOSA dialect, commonly used when importing from TensorFlow Lite (LiteRT), is in the process of incrementing to v1.0: https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708. During this transition, we anticipate that files imported from .tflite to .mlirbc may not be compatible with the IREE compiler. See #19683 and #19777 for more details.
Support for importing and compiling TensorFlow models is known to be unstable. We expect this to improve after migrating to new APIs. Follow #19917 for updates.

#19714: The legacy_sync compilation mode has been removed since all in-tree compiler targets have been migrated off of it. The remaining asynchronous mode produces non-blocking operations that enable more multi-device parallelism.
#19720: The tuner now supports specializing for GPUs that share the same ISA but have different hardware capabilities.
Compilation time improvements, particularly for large, sharded models: #19791, #19794
#19881: The llvm-cpu target now supports parsing AArch64 cpu features.

#19842: IREE is now using a newer (currently unreleased) version of Tracy. To capture and view traces of the IREE runtime, a matching version of the Tracy tools should be used. See also https://iree.dev/developers/performance/profiling-with-tracy/.
#19640: The special IREE_WHOLE_BUFFER value was renamed to IREE_HAL_WHOLE_BUFFER.

#19663: More source dependencies are now managed by Dependabot.
#19716: IREE's Python bindings can now be installed as editable wheels. See the documentation at https://iree.dev/building-from-source/getting-started/#python-bindings.
#19790: IREE now uses nanobind for its runtime and compiler Python bindings, instead of mixing nanobind and pybind11.

Full list of changes: v3.1.0...v3.2.0