Skip to content

Releases: iree-org/iree

iree candidate iree-3.11.0rc20260320

20 Mar 10:41
0b5ea9f

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

Release v3.11.0

19 Mar 23:25
e4a3b04

Choose a tag to compare

IREE v3.11.0 Release Notes

Release Candidate: iree-3.11.0rc20260316
Commits: ~539 commits since v3.10.0
VMFB Bytecode Version: 17.0 (unchanged from v3.10.0)


Highlights

  • New async I/O infrastructure: Proactor-based async I/O with causal frontier scheduling, enabling cross-process shared memory support
  • Streaming tokenizer: Full HuggingFace-compatible tokenizer with tiktoken format support for OpenAI BPE vocabularies (click here for more info)
  • Python 3.10+ requirement: Minimum Python version bumped to 3.10; Python 3.12+ supported via Stable ABI (abi3).
  • ROCm flag rename: iree-hip-* compiler flags renamed to iree-rocm-* (old names deprecated with warnings)
  • Enhanced vector distribution: Refactored 2-phase forward/backward layout analysis with improved transfer_gather support

Breaking Changes

VMFB Compatibility

  • VMFB bytecode version unchanged (17.0) - VMFBs compiled with v3.10.0 remain compatible with v3.11.0 runtime
    • No recompilation needed when upgrading from v3.10.0

Python Version Requirement

  • Minimum Python version is now 3.10 (#23591)

Compiler Flag Renames

  • iree-hip-* flags renamed to iree-rocm-* (#23420)
    • Old flag names emit deprecation warnings but still work
    • CMake: IREE_HIP_TEST_TARGET_CHIPIREE_ROCM_TEST_TARGET_CHIP

Build System Changes

  • Minimum CMake version bumped to 3.26 (#23607)
    • Required for Python Stable ABI support

API Changes

  • map_gather/map_scatter ops renamed to map_load/map_store in LinalgExt (#23481)

What's New

1. Compiler

1.1 Async Infrastructure & Tokenizers

Major new infrastructure for async I/O and text processing:

  • Added proactor-based async I/O with causal frontier scheduling (iree/async/) (#23527)
  • Added streaming tokenizer with full HuggingFace compatibility (iree/tokenizer/) (#23528)
  • Graceful degradation for io_uring slab registration on RLIMIT_MEMLOCK (#23654)
  • Added tiktoken format loader for OpenAI BPE vocabularies (#23663)
  • Added async infrastructure for cross-process shared memory (#23688)

1.2 Codegen & Vector Distribution

Significant improvements to vector distribution and code generation:

  • Added support for shape_cast in vector distribution (#23307)
  • Support for padding integer attention masks (#23430)
  • Added arg_compare operation to VectorExt (#23386)
  • Refactored transfer_gather to use unified indexing_maps (#23510)
  • Added distribution pattern for iree_codegen.inner_tiled (#23483)
  • Added vectorization support for iree_linalg_ext.arg_compare (#23440)
  • Added transfer_gather unrolling (#23517)
  • Support multi-batch gather vectorization to transfer_gather (#23552)
  • Added transfer_gather canonicalizations for masking (#23565)
  • Refactored VectorLayoutAnalysis into 2-phase forward/backward design (#23611)
  • Added TransferScatterOp definition and verifier (#23666)
  • Introduced VectorizableOpInterface and migrated all ops (#23653, #23656, #23658, #23662, #23712, #23713, #23767)
  • Added iree_map dialect with PackMapAttr and VectorLayoutInterface (#23671, #23672)
  • Added TransferScatterOp bufferization support (#23719)
  • Materialize vector masking on VectorDistribute pipeline (#23679)
  • Added vectorization of non-projected linalg.generic (#23664)
  • Implemented ValueBoundsOpInterface for ToLayoutOp (#23766)
  • Apply bounds to subgroup_id (#23768)

1.3 GPU Codegen Improvements

  • Added multi-buffering support for gather_to_lds async copy mode (#23354)
  • Enabled swizzling for scaled matmuls (#23175)
  • Added CombineSourceLayoutTransformation pass for MapGatherOp (#23165)
  • Reworked GPUVerifyDistribution to use PreOrder walk with skip (#23502)
  • Combine CombineBarrierRegionsPass and CombineValueBarrierOps into a single pass GPUCombineValueSemanticsBarriersPass (#23518)
  • Added async copy mode pipelining for gather_to_lds (#23400)
  • Move hoisting to interface and add it for barrier ops (#23519)
  • GPU shared memory allocation based on layout analysis (#23631)
  • Added iree_gpu.global_subgroup_barrier op (#23451)
  • Added coalescing to reduction tiling (#23673)
  • Make VectorReductionToGPU scf.forall-aware (#23686)
  • Fixed shared memory estimation for multi-buffering (#23736)
  • Added explicit async markers for multi-buffered async load pipelining (#23648)

1.4 GPU Heuristics

  • Prefer larger MMA intrinsics for very large compute-bound GEMMs (#23641)
  • Added min-based tile distribution for imbalanced M/N problems (#23619)
  • Updated number of VGPRs on gfx1250 (RDNA4) (#23709)
  • Refactored MMA heuristic seeds to be architecture-specific (#23717)

1.5 CPU Backend

  • Added CPU optimization level option (#23259)
  • Configure GatherOp tiling sizes based on semantics (#23419)
  • Tuning spec support for LLVMCPU (#23424)
  • New heuristic for AArch64 matmul vector tile sizes (#22932)
  • Enable masking by default for targets with AVX-512 (#23470)
  • Dynamic attention support by tiling K1 when needed (#23544)
  • Initial plumbing for inner_tiled with data-tiled MMA attribute (#23494)
  • Propagate reduction tile sizes to producers for fusion (#23660)
  • Use TileSwizzle for inner_tiled layout on CPU (#23705)

1.6 LDS & Memory Access

  • Only enable coalesced DMA when elements are aligned to minimum transfer size (#23416)
  • Pre-check to ensure all copies are DMA-convertible before converting any (#23472)
  • Added in_bounds attribute to CoalescedGatherDMAOp for tensor.pad fusion (#23365)
  • Added fallback for CoalescedGatherDMA lowering (#23560)

1.7 PCF operations enhancements

  • Fixed bufferization bugs for generic and loop ops (#23446)
  • Added producer fusion into pcf.generic/loop ops (#23447)
  • Added FuseSubgroupConsumers pass to fuse consumers and extract_slice ops into subgroup-scoped pcf.generic/loop ops (#23484)
  • Added MemoryEffectsOpInterface to WriteSliceOp (#23490)
  • Added tensor.collapse_shape fusion into pcf.generic/loop (#23491)

1.8 Dispatch Creation

  • Moved iteration space tracking to LinalgExt (#23221)
  • Ignore unit dims when comparing iteration spaces (#23362)
  • Updated split reduction heuristics for GEMM (#23423)
  • Fixed producer ...
Read more

iree candidate iree-3.11.0rc20260319

19 Mar 10:44
98078db

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

iree candidate iree-3.11.0rc20260318

18 Mar 10:20
49d88b4

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

iree candidate iree-3.11.0rc20260317

17 Mar 10:41
d7f5aba

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

iree candidate iree-3.11.0rc20260316

16 Mar 10:24
e4a3b04

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

iree candidate iree-3.11.0rc20260315

15 Mar 10:22
c578130

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

iree candidate iree-3.11.0rc20260314

14 Mar 09:53
94a0427

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

iree candidate iree-3.11.0rc20260313

13 Mar 10:38
b3c43f2

Choose a tag to compare

Pre-release

Automatic candidate release of iree.

iree candidate iree-3.11.0rc20260312

12 Mar 19:21
afcb63b

Choose a tag to compare

Pre-release

Automatic candidate release of iree.