[GPU] Add iree_gpu.global_subgroup_barrier op by qedawkins · Pull Request #23451 · iree-org/iree

qedawkins · 2026-02-10T23:42:43Z

Add a synchronization-only barrier op that has no memory fence semantics. The key distinction between this and gpu.barrier is that it's semantically global. That is, a subgroup is let through a barrier once all subgroups have reached any instance of the barrier, not just a specific one. Note that this is a dramatically more restrictive condition for optimizing the barriers themselves and should only be preferred in situations where it is expressly required. This op also does not fence memory and expects that to be handled separately.

Add a synchronization-only barrier op that has no memory fence semantics. Unlike gpu.barrier, this op preserves consecutive instances (no canonicalizer) which is critical for the pingpong double-buffer schedule. Fences are handled separately. Includes lowerings for both ROCDL and NVVM. Co-Authored-By: Claude Opus 4.6 <[email protected]>

qedawkins · 2026-02-11T02:50:19Z

@krzysz00 I figure you're the best to take a look at this. Basically we need to distinguish between "The" barrier and "a" barrier. gpu.barrier (mainly per the canonicalization that drops adjacent ones) semantically requires all threads to reach that specific barrier to proceed, while we need the semantics "all threads must reach any instance of the barrier to proceed" for wave specialization on targets that don't have named barriers.

If you have ideas other than a new op I'd be interested to hear your thoughts.

compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_rocdl_gfx1200.mlir

compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToNVVM.cpp

kuhar · 2026-02-11T14:10:28Z

compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToROCDL.cpp

+      const char *asmStr = ";;;WARNING: BREAKS DEBUG WATCHES\ns_barrier";
+      rewriter.replaceOpWithNewOp<LLVM::InlineAsmOp>(


Would it be worth moving to rocdl?

WDYM moving to rocdl? Moving the op to rocdl?

I'm generally leaning towards not having inline asm in iree and using rocdl for these low-level intrinsics

I'll note that the standing position of the compiler folks is "if you have felt the need to use inline asm, please don't, but if you must, tell us about it so we can see if your usecase can be eliminated"

https://github.com/llvm/llvm-project/blob/f9bca14a33888e70e2f9f1aa1f4a4edd3d7e5dfd/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp#L601

This is just a direct copy-paste of this. If you guys want to tell me what to put here I'd be happy to put whatever.

On re-reading, it's probably fine to keep the MI-100 workaround in here, and that comment's even correct.

krzysz00 · 2026-02-11T19:36:51Z

@qedawkins This is `gpu.barrier memfence []'

krzysz00

Holding

krzysz00 · 2026-02-11T19:38:21Z

compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToROCDL.cpp

+      // Pre-gfx90a: use inline asm.
+      auto asmDialectAttr = LLVM::AsmDialectAttr::get(rewriter.getContext(),
+                                                      LLVM::AsmDialect::AD_ATT);
+      const char *asmStr = ";;;WARNING: BREAKS DEBUG WATCHES\ns_barrier";


No it doesn't break debug watches

Use the intrinsic on pain of getting a stern talking to from the compiler team unless you know a good reason not to

qedawkins · 2026-02-11T19:39:02Z

Not quite,

gpu.barrier memfence []
gpu.barrier memfence []

this folds because the barriers are unique, meaning we can assume the second barrier is redundant. The same folder is invalid for global barriers since other workers can have a critical region between the two.

krzysz00 · 2026-02-11T19:40:47Z

Not quite,
gpu.barrier memfence []
gpu.barrier memfence []
this folds because the barriers are unique, meaning we can assume the second barrier is redundant. The same folder is invalid for global barriers since other workers can have a critical region between the two.

In which case, can you go upstream and patch the folder to special-case the no-memfence case?

qedawkins · 2026-02-11T19:42:15Z

amdgpu.lds_barrier folds for the same reason (as does any version of this op with memfences on it). The issue is one of semantics, unless we want to change that.

krzysz00 · 2026-02-11T19:43:05Z

Re-reading, I can see why you'd want this - it's to make sure we have correctness around pingpong

I'd still want to re-use upstream implementations here and see if maybe we don't want to fiddle with the folders up there instead of adding a new op that'l awkwardly diverge

krzysz00 · 2026-02-11T19:43:52Z

gpu.barrier memfence [] is specifically "a barrier with no memory fencing semantics" and so its folders should probably be updated accordingly

qedawkins · 2026-02-11T20:24:26Z

gpu.barrier memfence [] is specifically "a barrier with no memory fencing semantics" and so its folders should probably be updated accordingly

The memory fence is a separate discussion. Whether or not gpu.barrier fences memory is independent of the folder, so it really is a semantics change to the upstream barrier (either the folder never applies or always applies to sequential identical barriers). If we want to go with changing the semantics of gpu.barrier and dropping the folder I'm good with that. Though I do prefer not having to engage with discussions about it upstream because the current semantics + folder are also completely reasonable as a use case.

krzysz00 · 2026-02-16T20:33:25Z

@qedawkins Ok, I can see what you're getting at now, though I'm at least going to argue for wandering upstream and either explicitly adding a carve-out to the uniqueness thing when you specify memfence [] or doing something interesting like allowing for named barriers that don't allow the folding (barriers can have an optional id for example - or we'd do a named barrier object sort of thing and just only support the trivial usage on gfx9)

qedawkins · 2026-02-16T22:31:35Z

@qedawkins Ok, I can see what you're getting at now, though I'm at least going to argue for wandering upstream and either explicitly adding a carve-out to the uniqueness thing when you specify memfence [] or doing something interesting like allowing for named barriers that don't allow the folding (barriers can have an optional id for example - or we'd do a named barrier object sort of thing and just only support the trivial usage on gfx9)

I'm not interested in adding a flag to gpu.barrier for these semantics just to kill a folder. And justifying the semantics I'm adding here in a vacuum upstream does not sound appealing to me. Re: named barriers, I'm expecting we'll need something shaped fairly different since typically you need a signal and an await for those no?

krzysz00 · 2026-02-16T22:54:35Z

SPIR-V has, IIRC, named barriers but not a split signal/wait - but that's off-topic

krzysz00

Approved now that we've got more context on this one

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUOps.td

qedawkins marked this pull request as ready for review February 11, 2026 02:46

qedawkins requested review from Groverkss, Max191, krzysz00, kuhar and nirvedhmeshram as code owners February 11, 2026 02:46

kuhar reviewed Feb 11, 2026

View reviewed changes

krzysz00 requested changes Feb 11, 2026

View reviewed changes

Address comments

11fbae7

krzysz00 approved these changes Feb 16, 2026

View reviewed changes

compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUOps.td Show resolved Hide resolved

Add comment

1b37511

kuhar approved these changes Feb 17, 2026

View reviewed changes

qedawkins merged commit fa5580c into iree-org:main Mar 9, 2026
57 checks passed

qedawkins deleted the global-subgroup-barrier-pr branch March 9, 2026 16:25

		const char *asmStr = ";;;WARNING: BREAKS DEBUG WATCHES\ns_barrier";
		rewriter.replaceOpWithNewOp<LLVM::InlineAsmOp>(

Conversation

qedawkins commented Feb 10, 2026

Uh oh!

qedawkins commented Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

kuhar Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

qedawkins Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kuhar Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

krzysz00 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

qedawkins Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

krzysz00 Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Feb 11, 2026

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

krzysz00 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

qedawkins commented Feb 11, 2026

Uh oh!

krzysz00 commented Feb 11, 2026

Uh oh!

qedawkins commented Feb 11, 2026

Uh oh!

krzysz00 commented Feb 11, 2026

Uh oh!

krzysz00 commented Feb 11, 2026

Uh oh!

qedawkins commented Feb 11, 2026

Uh oh!

krzysz00 commented Feb 16, 2026

Uh oh!

qedawkins commented Feb 16, 2026

Uh oh!

krzysz00 commented Feb 16, 2026

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants