Skip to content

[CPU] Introduce optimization level to CPU backend.#23259

Merged
hanhanW merged 16 commits intoiree-org:mainfrom
hanhanW:users/hanhanW/global-options
Feb 6, 2026
Merged

[CPU] Introduce optimization level to CPU backend.#23259
hanhanW merged 16 commits intoiree-org:mainfrom
hanhanW:users/hanhanW/global-options

Conversation

@hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Jan 23, 2026

The revision adds CPUCodegenOptions and moves few CPU flags to the struct:

  • iree-llvmcpu-disable-distribution: always default off.
  • iree-llvmcpu-fail-on-out-of-bounds-stack-allocation: always default on.
  • iree-llvmcpu-reassociate-fp-reductions: default off for O0, on for O2+
  • Merged a pass level test to pipeline test because disabling distribution is a global option.

Note that iree-opt-level is the master flag that also controls iree-llvmcpu-mlir-opt-level unless you explicitly specify a value.

The opt-level fix causes split_reduction_using_tiling.mlir to fail because FP reassociation is now correctly disabled at O0. The test validates that split reduction produces similar results to normal reduction - a relative tolerance (rtol 0.01%) is more appropriate than absolute tolerance since FP rounding error scales with value magnitude.

It is a step towards #19072

Assisted-by: Claude

The revision adds CodegenOptions (shared) and CPUCodegenOptions and
moves few CPU flags to the struct:
- iree-llvmcpu-disable-distribution
- iree-llvmcpu-fail-on-out-of-bounds-stack-allocation
- iree-llvmcpu-reassociate-fp-reductions

It also introduces `iree-codegen-emit-performance-warnings` and uses it
for a performance warning.

The performance warning flag can also be used by software emulation
warning if needed.

It is a step towards iree-org#19072

Signed-off-by: hanhanW <[email protected]>
@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 23, 2026

I did not enable binding, so I did not get the test failure. It is intended. I was wrong that I thought iree-opt-level overwrite other *-opt-level. Let me remove the test.

session.set_flags("--iree-opt-level=O2")
flags = session.get_flags()
self.assertIn("--iree-opt-level=O2", flags)
self.assertIn("--iree-global-optimization-opt-level=O0", flags)
self.assertIn("--iree-opt-strip-assertions=false", flags)

Copy link
Member

@IanWood1 IanWood1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm excited to see codegen options using the optimization level defaults.

My main feedback is about session compatibility. Currently, the pipeline uses CPUCodegenOptions::FromFlags::get(), which only accesses global CLI flags. This means when the compiler is invoked through a session (e.g., via C API), any flags set via session.setFlags() will be ignored. Only the initial global CLI flags will be used. Getting this working will also make it easier for other backends/plugins to implement similar options.

Comment on lines +31 to +32
// Fail if the upper bound of dynamic stack allocation cannot be solved.
bool failOnOutOfBoundsStackAllocation = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defaults to true but the PR description says:

iree-llvmcpu-fail-on-out-of-bounds-stack-allocation: always default off.

Looks like this was always default true, so maybe the description needs to be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed the description. Thanks!

{init_at_opt(llvm::OptimizationLevel::O0, false),
init_at_opt(llvm::OptimizationLevel::O1, true)},
llvm::cl::desc("Enables reassociation for FP reductions."),
llvm::cl::cat(category));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the iree-llvmcpu-* options be in the LLVMCPU dir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Initially, I was planning to put each option to their own directories. However, I found that there is an option used by both LLVMCPU/ and ExternalInterfaces/, which determines whether scalable lowering should be enabled or not: clEnableScalableVectorization. I'm considering to expose this to public option.

The other point is that putting all the options together is less fragmented, IMO. So you can browse all the options for all the backends in a single file.

Thus, I ended up with having them altogether in this file; it is also easier to skim through all the options because they all locate in a single file. I think I should mention this in PR description. What do you think?

Comment on lines +85 to +100
def testCodegenOptLevelCascade(self):
# Set global opt level to O2, check cascade to codegen opt levels.
session = Session()
session.set_flags("--iree-opt-level=O2")
flags = session.get_flags()
self.assertIn("--iree-codegen-opt-level=O2", flags)
self.assertIn("--iree-llvmcpu-opt-level=O2", flags)

def testCodegenOptLevelOverride(self):
# Override just the CPU opt level, verify it doesn't affect others.
session = Session()
session.set_flags("--iree-opt-level=O2")
session.set_flags("--iree-llvmcpu-opt-level=O0")
flags = session.get_flags()
self.assertIn("--iree-codegen-opt-level=O2", flags)
self.assertIn("--iree-llvmcpu-opt-level=O0", flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure here is expected. If you look at testOptFlags() test above, --iree-global-optimization-opt-level and --iree-opt-strip-assertions are never modified. This test doesn't check that the options "cascade" but rather that they are not modified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might be worth it to keep these to ensure that the code here is working correctly.

if (!session.globalInit.usesCommandLine) {
session.binder.applyOptimizationDefaults();
}
auto resetDefaults = llvm::scope_exit([&]() {
if (!session.globalInit.usesCommandLine) {
session.binder.restoreOptimizationDefaults();
}
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test which makes sure that one of the option is captured by sessions.

// Codegen options (inherit from global opt level).
// Note: Registration order matters for inheritance.
(void)CodegenOptions::FromFlags::get();
(void)CPUCodegenOptions::FromFlags::get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to handle this in LLVMCPU's plugin registration. It should be possible to use the createUninitializedSession(OptionsBinder &localOptionsBinder) hook to bind the options. Then, we should be able to plumb through the options to the passes here:

void populateHALTargetBackends(IREE::HAL::TargetBackendList &targets) {
// #hal.executable.target<"llvm-cpu", ...
targets.add("llvm-cpu", [=]() {
return std::make_shared<LLVMCPUTargetBackend>(options.getTargetOptions());
});
}

This should be better because this will allow us to set the options per session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +363 to +365
if (CodegenOptions::FromFlags::get().emitPerformanceWarnings) {
loadOp.emitWarning("decomposing mismatched encoding load op");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be plumbed through as a member on the option & a pass option for the passes that use this pattern. The main reason is that FromFlags::get() doesn't work with sessions. But it also makes it easier to reproduce the warning if it's a pass option.

// Set all the distribution tile sizes to zero if thread distribution is
// disabled.
if (clDisableDistribution) {
if (CPUCodegenOptions::FromFlags::get().disableDistribution) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

LoweringConfigAttrInterface loweringConfig = getRootLoweringConfig(funcOp);
auto pipeline = translationInfo.getDispatchLoweringPassPipeline();
LLVMCPUPipelineOptions pipelineOpts;
pipelineOpts.cpuOpts = CPUCodegenOptions::FromFlags::get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a pass option, too. But it might be a bit trickier.

@hanhanW hanhanW requested a review from ScottTodd as a code owner January 26, 2026 22:37
@hanhanW hanhanW removed the request for review from ScottTodd January 26, 2026 22:37
Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW requested review from IanWood1 and removed request for Max191 and qedawkins January 26, 2026 22:41
@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 26, 2026

I tested with a python program that ensures the distribution option is working as expected:

Compiling with distribution enabled...
Compiling with distribution disabled...

Running benchmarks...

With distribution (parallel):    9.48 ms
Without distribution (sequential): 202.56 ms
Speedup from distribution: 21.37x

@hanhanW hanhanW requested a review from bjacob January 26, 2026 22:46

// Override Registration to also bind CPUCodegenOptions to the session.
struct Registration : PluginSession::Registration {
using PluginSession::Registration::Registration;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This using statement is required for

auto registration =
std::make_unique<typename SessionTy::Registration>(std::move(pluginId));

@hanhanW hanhanW changed the title [Codegen] Introduce optimization level to Codegen. [CPU] Introduce optimization level to CPU backend. Jan 27, 2026
@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 27, 2026

@IanWood1 I have a question. Why isn't iree-opt-level a master flag? E.g., I'd expect it sets default optimization level for sub-categories like global-opt, codegen, etc. Is it just not implemented yet, or it is intended?

(I'm mainly looking at this comment about having a master flag, i.e., iree-llvmcpu-opt-level, for CPU backend, but I think the iree-opt-level can serve the role?)

Copy link
Member

@IanWood1 IanWood1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in terms of setting up the Binder/Options. However, I'm not familiar enough with this part of the codebase so I'll hold of on approving. One thing that I just noticed is that there already exists an options struct for LLVMCPU named LLVMCPUTargetCLOptions:

struct LLVMCPUTargetCLOptions {

Binder method:

void LLVMCPUTargetCLOptions::bindOptions(OptionsBinder &binder) {

Given this other struct exists, why do we need the new CPUCodegenOptions struct? This isn't blocking. I'm just trying to get an idea of whats going on.

@IanWood1
Copy link
Member

@IanWood1 I have a question. Why isn't iree-opt-level a master flag? E.g., I'd expect it sets default optimization level for sub-categories like global-opt, codegen, etc. Is it just not implemented yet, or it is intended?

(I'm mainly looking at this comment about having a master flag, i.e., iree-llvmcpu-opt-level, for CPU backend, but I think the iree-opt-level can serve the role?)

Maybe I'm misunderstanding what you mean by master flag but --iree-opt-level should work as a master flag. For example, if you set --iree-opt-level=O3 then --iree-global-optimization-opt-level will also get set to O3. Additionally, when you have a flag that is linked to the optimization level, it will get set based on the pipeline opt level. For example, --iree-opt-strip-assertions gets set when --iree-global-optimization-opt-level is >O0 (either through setting the global opt flag or the master --iree-opt-level flag).

https://github.com/iree-org/iree/blob/7aa7be5fd3f237f3ef885437f3d1be3f8798dac6/tools/test/compile_flags.mlir demonstrates this behavior.

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 27, 2026

Looks good to me in terms of setting up the Binder/Options. However, I'm not familiar enough with this part of the codebase so I'll hold of on approving. One thing that I just noticed is that there already exists an options struct for LLVMCPU named LLVMCPUTargetCLOptions:

struct LLVMCPUTargetCLOptions {

Binder method:

void LLVMCPUTargetCLOptions::bindOptions(OptionsBinder &binder) {

Given this other struct exists, why do we need the new CPUCodegenOptions struct? This isn't blocking. I'm just trying to get an idea of whats going on.

How the codegen works in IREE is that we have some lowering/transformation/optimization that takes linalg ops as input and lower them to low-level dialect. On CPU side, they are lowered to LLVM dialect; we conver them to LLVM module and rely on LLVM to do further lowering. Having a new option allows us to set different optimization level on MLIR lowering and LLVM lowering. This is why I'm adding a new option for MLIR part.

@IanWood1 I have a question. Why isn't iree-opt-level a master flag? E.g., I'd expect it sets default optimization level for sub-categories like global-opt, codegen, etc. Is it just not implemented yet, or it is intended?
(I'm mainly looking at this comment about having a master flag, i.e., iree-llvmcpu-opt-level, for CPU backend, but I think the iree-opt-level can serve the role?)

Maybe I'm misunderstanding what you mean by master flag but --iree-opt-level should work as a master flag. For example, if you set --iree-opt-level=O3 then --iree-global-optimization-opt-level will also get set to O3. Additionally, when you have a flag that is linked to the optimization level, it will get set based on the pipeline opt level. For example, --iree-opt-strip-assertions gets set when --iree-global-optimization-opt-level is >O0 (either through setting the global opt flag or the master --iree-opt-level flag).

https://github.com/iree-org/iree/blob/7aa7be5fd3f237f3ef885437f3d1be3f8798dac6/tools/test/compile_flags.mlir demonstrates this behavior.

I see, thanks for the sharing on lit tests which demonstrates that it is a master flag. I was confused by the api_test.py. I thought that the iree-global-optimization-opt-level flag was not set to O2 in this case. I may be misunderstanding about what this test is doing.

session.set_flags("--iree-opt-level=O2")
flags = session.get_flags()
self.assertIn("--iree-opt-level=O2", flags)
self.assertIn("--iree-global-optimization-opt-level=O0", flags)
self.assertIn("--iree-opt-strip-assertions=false", flags)

@hanhanW hanhanW requested a review from IanWood1 January 27, 2026 22:19
Copy link
Contributor Author

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a "bug" that iree-opt-level does not work as master flag for iree-llvmcpu-mlir-opt-level. Claude helps me the fix: f1e7a09

Thus, we no longer need the change in docs/. The original suggestion includes iree-opt-level in the compile command.

Comment on lines +23 to +24
// Enables reassociation for FP reductions.
bool reassociateFpReductions = true;
Copy link
Contributor Author

@hanhanW hanhanW Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the issue because this is set wrongly. After thinking a while, should we just drop all the default values from this struct? The real default values are defined in .cpp, IIUC.

Signed-off-by: hanhanW <[email protected]>
@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 30, 2026

Looks good to me in terms of setting up the Binder/Options. However, I'm not familiar enough with this part of the codebase so I'll hold of on approving. One thing that I just noticed is that there already exists an options struct for LLVMCPU named LLVMCPUTargetCLOptions:

struct LLVMCPUTargetCLOptions {

Binder method:

void LLVMCPUTargetCLOptions::bindOptions(OptionsBinder &binder) {

Given this other struct exists, why do we need the new CPUCodegenOptions struct? This isn't blocking. I'm just trying to get an idea of whats going on.

How the codegen works in IREE is that we have some lowering/transformation/optimization that takes linalg ops as input and lower them to low-level dialect. On CPU side, they are lowered to LLVM dialect; we conver them to LLVM module and rely on LLVM to do further lowering. Having a new option allows us to set different optimization level on MLIR lowering and LLVM lowering. This is why I'm adding a new option for MLIR part.

Actually, there is a chance to have a single binder. We can declare a field for CPUCodegenOption in the old struct and reuse the binder. Let me give it a shot.

@hanhanW
Copy link
Contributor Author

hanhanW commented Feb 2, 2026

I gave it a shot, and I learned the setup better. If we move the CPUOptions into LLVMCPUTargetCLOptions and we reuse the binder, we will hit a dependency violation that core codegen will depend on plugin. Because the default value of CPUOptions in LLVMCPULowerExecutableTargetPass pass will be LLVMTargetOptions::FromFlags::get().cpuOpts. I think core (Codegen/LLVMCPU/) can't depend on plugin code (plugins/target/LLVMCPU/).

Thus, if we want to keep CPUOptions controled by global without the weird dependency, we'll need to the registeration separately, which leads to current situation.

Summary:

  • Option A (current state): Keep CPUCodegenOptions::FromFlags in core. Custom Registration needed to avoid double-registration. Global CLI works everywhere.
  • Option B: Remove FromFlags from CPUCodegenOptions. We will lose the global cli control at pass level, which makes logic disconnect.

@IanWood1 does it answer your question?

Copy link
Member

@IanWood1 IanWood1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on setting up the binder with the plugin.

From the description:

The opt-level fix causes split_reduction_using_tiling.mlir to fail because FP reassociation is now correctly disabled at O0. The test validates that split reduction produces similar results to normal reduction - a relative tolerance (rtol 0.01%) is more appropriate than absolute tolerance since FP rounding error scales with value magnitude.

Do we have any way to test the binder is working correctly when >O0? It would be nice to have a test, but this might be a difficult thing to check.

@hanhanW
Copy link
Contributor Author

hanhanW commented Feb 3, 2026

LGTM on setting up the binder with the plugin.

From the description:

The opt-level fix causes split_reduction_using_tiling.mlir to fail because FP reassociation is now correctly disabled at O0. The test validates that split reduction produces similar results to normal reduction - a relative tolerance (rtol 0.01%) is more appropriate than absolute tolerance since FP rounding error scales with value magnitude.

Do we have any way to test the binder is working correctly when >O0? It would be nice to have a test, but this might be a difficult thing to check.

Good point! Tests are added in 3fb271b. Thanks that we have pipeline tests for split reduction flag.

Copy link
Collaborator

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this change. I highlighted one issue I found here with the use of std::optional but I cant suggest an alternative.

@benvanik if you have thoughts here please let us know.

// opt-level-dependent defaults are applied before options are copied. The
// `std::optional` is required because PluginManagerSession does not have
// default constructor. Using `std::optional` allows deferring construction.
std::optional<PluginManagerSession> pluginSession;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue I have with this PR is this std::optional thing. I cant suggest an alternative though cause I couldnt come up with a "cleaner" way of doing this.

Copy link
Contributor Author

@hanhanW hanhanW Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it another shot and it turns out that we don't need the optional. Because the constructor of pluginSession stores reference for PluginManagerOptions and it doesn't read the values, so the ordering does not matter here. It is safe as long as the values are not run until initializePlugins.

EDIT: I was wrong again, so reverted it back.

linalg.yield %0 : f32
} -> tensor<128xf32>
check.expect_almost_eq (%normal_reduce, %split_reduction, atol 0.1) : tensor<128xf32>
check.expect_almost_eq (%normal_reduce, %split_reduction, atol 0.0, rtol 0.0001) : tensor<128xf32>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atol of 0? really that is pretty stringent. I guess you turned off reassociation? Its good though if we can keep it this way.

Copy link
Contributor Author

@hanhanW hanhanW Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rtol matters here. The default is zero, which is too small. The values are not small, so atol does not play a big role here.

@hanhanW hanhanW enabled auto-merge (squash) February 4, 2026 22:14
@hanhanW hanhanW force-pushed the users/hanhanW/global-options branch from 880b83a to bb0d837 Compare February 5, 2026 01:57
@hanhanW
Copy link
Contributor Author

hanhanW commented Feb 5, 2026

I fixed a default value in 384b2e7, but it triggered bugs. Then I realized that the iree-opt-level never works with iree-opt unless we have bb0d837

@IanWood1 can you take a look?

@hanhanW hanhanW disabled auto-merge February 5, 2026 03:17
@hanhanW hanhanW force-pushed the users/hanhanW/global-options branch from 391dcc1 to ab014d9 Compare February 5, 2026 23:15
@hanhanW hanhanW merged commit 1b75890 into iree-org:main Feb 6, 2026
54 of 57 checks passed
@hanhanW hanhanW deleted the users/hanhanW/global-options branch February 6, 2026 00:28
MaheshRavishankar pushed a commit to MaheshRavishankar/iree that referenced this pull request Feb 24, 2026
The revision adds CPUCodegenOptions and moves few CPU flags to the
struct:
- iree-llvmcpu-disable-distribution: always default `off`.
- iree-llvmcpu-fail-on-out-of-bounds-stack-allocation: always default
`on`.
- iree-llvmcpu-reassociate-fp-reductions: default `off` for `O0`, `on`
for `O2+`
- Merged a pass level test to pipeline test because disabling
distribution is a global option.

Note that `iree-opt-level` is the master flag that also controls
`iree-llvmcpu-mlir-opt-level` unless you explicitly specify a value.

The opt-level fix causes split_reduction_using_tiling.mlir to fail
because FP reassociation is now correctly disabled at O0. The test
validates that split reduction produces similar results to normal
reduction - a relative tolerance (rtol 0.01%) is more appropriate than
absolute tolerance since FP rounding error scales with value magnitude.

It is a step towards iree-org#19072

Assisted-by: Claude

---------

Signed-off-by: hanhanW <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants