IREE v3.11.0 Release Notes

Automatic candidate release of iree.

IREE v3.11.0 Release Notes

Release Candidate: iree-3.11.0rc20260316
Commits: ~539 commits since v3.10.0
VMFB Bytecode Version: 17.0 (unchanged from v3.10.0)

Highlights

New async I/O infrastructure: Proactor-based async I/O with causal frontier scheduling, enabling cross-process shared memory support
Streaming tokenizer: Full HuggingFace-compatible tokenizer with tiktoken format support for OpenAI BPE vocabularies (click here for more info)
Python 3.10+ requirement: Minimum Python version bumped to 3.10; Python 3.12+ supported via Stable ABI (abi3).
ROCm flag rename: iree-hip-* compiler flags renamed to iree-rocm-* (old names deprecated with warnings)
Enhanced vector distribution: Refactored 2-phase forward/backward layout analysis with improved transfer_gather support

Breaking Changes

VMFB Compatibility

VMFB bytecode version unchanged (17.0) - VMFBs compiled with v3.10.0 remain compatible with v3.11.0 runtime
- No recompilation needed when upgrading from v3.10.0

Python Version Requirement

Minimum Python version is now 3.10 (#23591)

Compiler Flag Renames

iree-hip-* flags renamed to iree-rocm-* (#23420)
- Old flag names emit deprecation warnings but still work
- CMake: IREE_HIP_TEST_TARGET_CHIP → IREE_ROCM_TEST_TARGET_CHIP

Build System Changes

Minimum CMake version bumped to 3.26 (#23607)
- Required for Python Stable ABI support

API Changes

map_gather/map_scatter ops renamed to map_load/map_store in LinalgExt (#23481)

What's New

1. Compiler

1.1 Async Infrastructure & Tokenizers

Major new infrastructure for async I/O and text processing:

Added proactor-based async I/O with causal frontier scheduling (iree/async/) (#23527)
Added streaming tokenizer with full HuggingFace compatibility (iree/tokenizer/) (#23528)
Graceful degradation for io_uring slab registration on RLIMIT_MEMLOCK (#23654)
Added tiktoken format loader for OpenAI BPE vocabularies (#23663)
Added async infrastructure for cross-process shared memory (#23688)

1.2 Codegen & Vector Distribution

Significant improvements to vector distribution and code generation:

Added support for shape_cast in vector distribution (#23307)
Support for padding integer attention masks (#23430)
Added arg_compare operation to VectorExt (#23386)
Refactored transfer_gather to use unified indexing_maps (#23510)
Added distribution pattern for iree_codegen.inner_tiled (#23483)
Added vectorization support for iree_linalg_ext.arg_compare (#23440)
Added transfer_gather unrolling (#23517)
Support multi-batch gather vectorization to transfer_gather (#23552)
Added transfer_gather canonicalizations for masking (#23565)
Refactored VectorLayoutAnalysis into 2-phase forward/backward design (#23611)
Added TransferScatterOp definition and verifier (#23666)
Introduced VectorizableOpInterface and migrated all ops (#23653, #23656, #23658, #23662, #23712, #23713, #23767)
Added iree_map dialect with PackMapAttr and VectorLayoutInterface (#23671, #23672)
Added TransferScatterOp bufferization support (#23719)
Materialize vector masking on VectorDistribute pipeline (#23679)
Added vectorization of non-projected linalg.generic (#23664)
Implemented ValueBoundsOpInterface for ToLayoutOp (#23766)
Apply bounds to subgroup_id (#23768)

1.3 GPU Codegen Improvements

Added multi-buffering support for gather_to_lds async copy mode (#23354)
Enabled swizzling for scaled matmuls (#23175)
Added CombineSourceLayoutTransformation pass for MapGatherOp (#23165)
Reworked GPUVerifyDistribution to use PreOrder walk with skip (#23502)
Combine CombineBarrierRegionsPass and CombineValueBarrierOps into a single pass GPUCombineValueSemanticsBarriersPass (#23518)
Added async copy mode pipelining for gather_to_lds (#23400)
Move hoisting to interface and add it for barrier ops (#23519)
GPU shared memory allocation based on layout analysis (#23631)
Added iree_gpu.global_subgroup_barrier op (#23451)
Added coalescing to reduction tiling (#23673)
Make VectorReductionToGPU scf.forall-aware (#23686)
Fixed shared memory estimation for multi-buffering (#23736)
Added explicit async markers for multi-buffered async load pipelining (#23648)

1.4 GPU Heuristics

Prefer larger MMA intrinsics for very large compute-bound GEMMs (#23641)
Added min-based tile distribution for imbalanced M/N problems (#23619)
Updated number of VGPRs on gfx1250 (RDNA4) (#23709)
Refactored MMA heuristic seeds to be architecture-specific (#23717)

1.5 CPU Backend

Added CPU optimization level option (#23259)
Configure GatherOp tiling sizes based on semantics (#23419)
Tuning spec support for LLVMCPU (#23424)
New heuristic for AArch64 matmul vector tile sizes (#22932)
Enable masking by default for targets with AVX-512 (#23470)
Dynamic attention support by tiling K1 when needed (#23544)
Initial plumbing for inner_tiled with data-tiled MMA attribute (#23494)
Propagate reduction tile sizes to producers for fusion (#23660)
Use TileSwizzle for inner_tiled layout on CPU (#23705)

1.6 LDS & Memory Access

Only enable coalesced DMA when elements are aligned to minimum transfer size (#23416)
Pre-check to ensure all copies are DMA-convertible before converting any (#23472)
Added in_bounds attribute to CoalescedGatherDMAOp for tensor.pad fusion (#23365)
Added fallback for CoalescedGatherDMA lowering (#23560)

1.7 PCF operations enhancements

Fixed bufferization bugs for generic and loop ops (#23446)
Added producer fusion into pcf.generic/loop ops (#23447)
Added FuseSubgroupConsumers pass to fuse consumers and extract_slice ops into subgroup-scoped pcf.generic/loop ops (#23484)
Added MemoryEffectsOpInterface to WriteSliceOp (#23490)
Added tensor.collapse_shape fusion into pcf.generic/loop (#23491)

1.8 Dispatch Creation

Moved iteration space tracking to LinalgExt (#23221)
Ignore unit dims when comparing iteration spaces (#23362)
Updated split reduction heuristics for GEMM (#23423)
Fixed producer ...

Automatic candidate release of iree.

Releases: iree-org/iree

iree candidate iree-3.11.0rc20260320

Uh oh!

Release v3.11.0

IREE v3.11.0 Release Notes

Highlights

Breaking Changes

VMFB Compatibility

Python Version Requirement

Compiler Flag Renames

Build System Changes

API Changes

What's New

1. Compiler

1.1 Async Infrastructure & Tokenizers

1.2 Codegen & Vector Distribution

1.3 GPU Codegen Improvements

1.4 GPU Heuristics

1.5 CPU Backend

1.6 LDS & Memory Access

1.7 PCF operations enhancements

1.8 Dispatch Creation

Contributors

Uh oh!

iree candidate iree-3.11.0rc20260319

Uh oh!

iree candidate iree-3.11.0rc20260318

Uh oh!

iree candidate iree-3.11.0rc20260317

Uh oh!

iree candidate iree-3.11.0rc20260316

Uh oh!

iree candidate iree-3.11.0rc20260315

Uh oh!

iree candidate iree-3.11.0rc20260314

Uh oh!

iree candidate iree-3.11.0rc20260313

Uh oh!

iree candidate iree-3.11.0rc20260312

Uh oh!