Releases: iree-org/iree
Releases · iree-org/iree
iree candidate iree-3.11.0rc20260320
Automatic candidate release of iree.
Release v3.11.0
IREE v3.11.0 Release Notes
Release Candidate: iree-3.11.0rc20260316
Commits: ~539 commits since v3.10.0
VMFB Bytecode Version: 17.0 (unchanged from v3.10.0)
Highlights
- New async I/O infrastructure: Proactor-based async I/O with causal frontier scheduling, enabling cross-process shared memory support
- Streaming tokenizer: Full HuggingFace-compatible tokenizer with tiktoken format support for OpenAI BPE vocabularies (click here for more info)
- Python 3.10+ requirement: Minimum Python version bumped to 3.10; Python 3.12+ supported via Stable ABI (abi3).
- ROCm flag rename:
iree-hip-*compiler flags renamed toiree-rocm-*(old names deprecated with warnings) - Enhanced vector distribution: Refactored 2-phase forward/backward layout analysis with improved transfer_gather support
Breaking Changes
VMFB Compatibility
- VMFB bytecode version unchanged (17.0) - VMFBs compiled with
v3.10.0remain compatible withv3.11.0runtime- No recompilation needed when upgrading from v3.10.0
Python Version Requirement
- Minimum Python version is now 3.10 (#23591)
Compiler Flag Renames
iree-hip-*flags renamed toiree-rocm-*(#23420)- Old flag names emit deprecation warnings but still work
- CMake:
IREE_HIP_TEST_TARGET_CHIP→IREE_ROCM_TEST_TARGET_CHIP
Build System Changes
- Minimum CMake version bumped to 3.26 (#23607)
- Required for Python Stable ABI support
API Changes
map_gather/map_scatterops renamed tomap_load/map_storein LinalgExt (#23481)
What's New
1. Compiler
1.1 Async Infrastructure & Tokenizers
Major new infrastructure for async I/O and text processing:
- Added proactor-based async I/O with causal frontier scheduling (
iree/async/) (#23527) - Added streaming tokenizer with full HuggingFace compatibility (
iree/tokenizer/) (#23528) - Graceful degradation for io_uring slab registration on RLIMIT_MEMLOCK (#23654)
- Added tiktoken format loader for OpenAI BPE vocabularies (#23663)
- Added async infrastructure for cross-process shared memory (#23688)
1.2 Codegen & Vector Distribution
Significant improvements to vector distribution and code generation:
- Added support for
shape_castin vector distribution (#23307) - Support for padding integer attention masks (#23430)
- Added
arg_compareoperation to VectorExt (#23386) - Refactored
transfer_gatherto use unifiedindexing_maps(#23510) - Added distribution pattern for
iree_codegen.inner_tiled(#23483) - Added vectorization support for
iree_linalg_ext.arg_compare(#23440) - Added
transfer_gatherunrolling (#23517) - Support multi-batch gather vectorization to
transfer_gather(#23552) - Added
transfer_gathercanonicalizations for masking (#23565) - Refactored
VectorLayoutAnalysisinto 2-phase forward/backward design (#23611) - Added
TransferScatterOpdefinition and verifier (#23666) - Introduced
VectorizableOpInterfaceand migrated all ops (#23653, #23656, #23658, #23662, #23712, #23713, #23767) - Added
iree_mapdialect withPackMapAttrandVectorLayoutInterface(#23671, #23672) - Added
TransferScatterOpbufferization support (#23719) - Materialize vector masking on
VectorDistributepipeline (#23679) - Added vectorization of non-projected
linalg.generic(#23664) - Implemented
ValueBoundsOpInterfaceforToLayoutOp(#23766) - Apply bounds to
subgroup_id(#23768)
1.3 GPU Codegen Improvements
- Added multi-buffering support for
gather_to_ldsasync copy mode (#23354) - Enabled swizzling for scaled matmuls (#23175)
- Added
CombineSourceLayoutTransformationpass forMapGatherOp(#23165) - Reworked
GPUVerifyDistributionto use PreOrder walk with skip (#23502) - Combine
CombineBarrierRegionsPassandCombineValueBarrierOpsinto a single passGPUCombineValueSemanticsBarriersPass(#23518) - Added async copy mode pipelining for
gather_to_lds(#23400) - Move hoisting to interface and add it for barrier ops (#23519)
- GPU shared memory allocation based on layout analysis (#23631)
- Added
iree_gpu.global_subgroup_barrierop (#23451) - Added coalescing to reduction tiling (#23673)
- Make
VectorReductionToGPUscf.forall-aware (#23686) - Fixed shared memory estimation for multi-buffering (#23736)
- Added explicit async markers for multi-buffered async load pipelining (#23648)
1.4 GPU Heuristics
- Prefer larger MMA intrinsics for very large compute-bound GEMMs (#23641)
- Added min-based tile distribution for imbalanced M/N problems (#23619)
- Updated number of VGPRs on gfx1250 (RDNA4) (#23709)
- Refactored MMA heuristic seeds to be architecture-specific (#23717)
1.5 CPU Backend
- Added CPU optimization level option (#23259)
- Configure
GatherOptiling sizes based on semantics (#23419) - Tuning spec support for LLVMCPU (#23424)
- New heuristic for AArch64 matmul vector tile sizes (#22932)
- Enable masking by default for targets with AVX-512 (#23470)
- Dynamic attention support by tiling K1 when needed (#23544)
- Initial plumbing for
inner_tiledwith data-tiled MMA attribute (#23494) - Propagate reduction tile sizes to producers for fusion (#23660)
- Use
TileSwizzleforinner_tiledlayout on CPU (#23705)
1.6 LDS & Memory Access
- Only enable coalesced DMA when elements are aligned to minimum transfer size (#23416)
- Pre-check to ensure all copies are DMA-convertible before converting any (#23472)
- Added
in_boundsattribute toCoalescedGatherDMAOpfortensor.padfusion (#23365) - Added fallback for
CoalescedGatherDMAlowering (#23560)
1.7 PCF operations enhancements
- Fixed bufferization bugs for generic and loop ops (#23446)
- Added producer fusion into pcf.generic/loop ops (#23447)
- Added
FuseSubgroupConsumerspass to fuse consumers andextract_sliceops into subgroup-scopedpcf.generic/loopops (#23484) - Added
MemoryEffectsOpInterfacetoWriteSliceOp(#23490) - Added
tensor.collapse_shapefusion into pcf.generic/loop (#23491)
1.8 Dispatch Creation
iree candidate iree-3.11.0rc20260319
Automatic candidate release of iree.
iree candidate iree-3.11.0rc20260318
Automatic candidate release of iree.
iree candidate iree-3.11.0rc20260317
Automatic candidate release of iree.
iree candidate iree-3.11.0rc20260316
Automatic candidate release of iree.
iree candidate iree-3.11.0rc20260315
Automatic candidate release of iree.
iree candidate iree-3.11.0rc20260314
Automatic candidate release of iree.
iree candidate iree-3.11.0rc20260313
Automatic candidate release of iree.
iree candidate iree-3.11.0rc20260312
Automatic candidate release of iree.