Skip to content

[PCF] Add producer fusion into pcf.generic/loop ops#23447

Merged
qedawkins merged 2 commits intoiree-org:mainfrom
qedawkins:fuse-producers
Feb 18, 2026
Merged

[PCF] Add producer fusion into pcf.generic/loop ops#23447
qedawkins merged 2 commits intoiree-org:mainfrom
qedawkins:fuse-producers

Conversation

@qedawkins
Copy link
Contributor

Add FuseProducersPass (iree-pcf-fuse-producers) that fuses DPS producers into pcf.generic and pcf.loop ops through their tied init arguments.

For each tied init that is the single result of a TilingInterface + DestinationStyleOpInterface producer (e.g. linalg.fill, linalg.transpose), the pass:

  1. Replaces the scoped op's init with the producer's DPS init
  2. At each pcf.read_slice on the corresponding sref, generates a tiled version of the producer via generateResultTileValue
  3. For vector-typed read_slices, converts the tiled tensor result to a vector via vector.transfer_read
  4. Erases the original producer if it has no remaining uses

The pass requires SyncOnReturn semantics on the sref, under which overlapping reads and writes are undefined behavior, allowing all reads to assume as though they see the init value.

Add FuseProducersPass (iree-pcf-fuse-producers) that fuses DPS producers
into pcf.generic and pcf.loop ops through their tied init arguments.

For each tied init that is the single result of a TilingInterface +
DestinationStyleOpInterface producer (e.g. linalg.fill, linalg.transpose),
the pass:
1. Replaces the scoped op's init with the producer's DPS init
2. At each pcf.read_slice on the corresponding sref, generates a tiled
   version of the producer via generateResultTileValue
3. For vector-typed read_slices, converts the tiled tensor result to a
   vector via vector.transfer_read
4. Erases the original producer if it has no remaining uses

The pass requires SyncOnReturn semantics on the sref, under which
overlapping reads and writes are undefined behavior, allowing all reads
to be treated as seeing the init value.
@qedawkins qedawkins marked this pull request as ready for review February 11, 2026 00:52
Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just some nits.

One question: Is fusion through inits expected to happen separately from fusion via implicit capture? Why does this pass only perform fusion through the init args?

@qedawkins
Copy link
Contributor Author

Overall LGTM, just some nits.

One question: Is fusion through inits expected to happen separately from fusion via implicit capture? Why does this pass only perform fusion through the init args?

Fusion through implicit capture doesn't need anything special since it's just extract_slice(tilable op) you can use the same pattern everywhere.

@qedawkins qedawkins enabled auto-merge (squash) February 18, 2026 19:19
@qedawkins qedawkins merged commit f9ebc10 into iree-org:main Feb 18, 2026
55 of 56 checks passed
@qedawkins qedawkins deleted the fuse-producers branch February 19, 2026 14:29
@Max191
Copy link
Contributor

Max191 commented Feb 23, 2026

Fusion through implicit capture doesn't need anything special since it's just extract_slice(tilable op) you can use the same pattern everywhere.

So then we won't have pcf.read_slice ops on anything but the sref block args?

@qedawkins
Copy link
Contributor Author

Fusion through implicit capture doesn't need anything special since it's just extract_slice(tilable op) you can use the same pattern everywhere.

So then we won't have pcf.read_slice ops on anything but the sref block args?

You'd have to convert to sref somewhere, and that wouldn't be via the pcf.generic/loop since fusion through its tied inits is what this handles and that's the only source of sref the loop provides. So if we did have to handle pcf.read_slice somewhere else, it would be orthogonal to this. In general, I don't think making management of implicit capture the responsibility of the capturing op is a good idea outside of converstion to/from isolated from above. It breaks the core MLIR assumption that all uses are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants