Notable changes
Framework support
- The TOSA dialect, commonly used when importing from TensorFlow Lite (LiteRT), is in the process of incrementing to v1.0: https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708. During this transition, we anticipate that files imported from
.tflite
to.mlirbc
may not be compatible with the IREE compiler. See #19683 and #19777 for more details. - Support for importing and compiling TensorFlow models is known to be unstable. We expect this to improve after migrating to new APIs. Follow #19917 for updates.
Compiler
- #19714: The
legacy_sync
compilation mode has been removed since all in-tree compiler targets have been migrated off of it. The remaining asynchronous mode produces non-blocking operations that enable more multi-device parallelism. - #19720: The tuner now supports specializing for GPUs that share the same ISA but have different hardware capabilities.
- Compilation time improvements, particularly for large, sharded models: #19791, #19794
- #19881: The
llvm-cpu
target now supports parsing AArch64 cpu features.
Runtime
- #19842: IREE is now using a newer (currently unreleased) version of Tracy. To capture and view traces of the IREE runtime, a matching version of the Tracy tools should be used. See also https://iree.dev/developers/performance/profiling-with-tracy/.
- #19640: The special
IREE_WHOLE_BUFFER
value was renamed toIREE_HAL_WHOLE_BUFFER
.
Development tools
- #19663: More source dependencies are now managed by Dependabot.
- #19716: IREE's Python bindings can now be installed as editable wheels. See the documentation at https://iree.dev/building-from-source/getting-started/#python-bindings.
- #19790: IREE now uses nanobind for its runtime and compiler Python bindings, instead of mixing nanobind and pybind11.
Changelog
Full list of changes: v3.1.0...v3.2.0