[CPU] Introduce optimization level to CPU backend.#23259
[CPU] Introduce optimization level to CPU backend.#23259hanhanW merged 16 commits intoiree-org:mainfrom
Conversation
The revision adds CodegenOptions (shared) and CPUCodegenOptions and moves few CPU flags to the struct: - iree-llvmcpu-disable-distribution - iree-llvmcpu-fail-on-out-of-bounds-stack-allocation - iree-llvmcpu-reassociate-fp-reductions It also introduces `iree-codegen-emit-performance-warnings` and uses it for a performance warning. The performance warning flag can also be used by software emulation warning if needed. It is a step towards iree-org#19072 Signed-off-by: hanhanW <[email protected]>
|
I did not enable binding, so I did not get the test failure. It is intended. I was wrong that I thought iree/compiler/bindings/python/test/api/api_test.py Lines 62 to 66 in f816205 |
There was a problem hiding this comment.
Nice! I'm excited to see codegen options using the optimization level defaults.
My main feedback is about session compatibility. Currently, the pipeline uses CPUCodegenOptions::FromFlags::get(), which only accesses global CLI flags. This means when the compiler is invoked through a session (e.g., via C API), any flags set via session.setFlags() will be ignored. Only the initial global CLI flags will be used. Getting this working will also make it easier for other backends/plugins to implement similar options.
compiler/src/iree/compiler/Codegen/LLVMCPU/LLVMCPULowerExecutableTarget.cpp
Show resolved
Hide resolved
| // Fail if the upper bound of dynamic stack allocation cannot be solved. | ||
| bool failOnOutOfBoundsStackAllocation = true; |
There was a problem hiding this comment.
This defaults to true but the PR description says:
iree-llvmcpu-fail-on-out-of-bounds-stack-allocation: always default
off.
Looks like this was always default true, so maybe the description needs to be fixed?
There was a problem hiding this comment.
Good catch, fixed the description. Thanks!
| {init_at_opt(llvm::OptimizationLevel::O0, false), | ||
| init_at_opt(llvm::OptimizationLevel::O1, true)}, | ||
| llvm::cl::desc("Enables reassociation for FP reductions."), | ||
| llvm::cl::cat(category)); |
There was a problem hiding this comment.
Should the iree-llvmcpu-* options be in the LLVMCPU dir?
There was a problem hiding this comment.
Good question. Initially, I was planning to put each option to their own directories. However, I found that there is an option used by both LLVMCPU/ and ExternalInterfaces/, which determines whether scalable lowering should be enabled or not: clEnableScalableVectorization. I'm considering to expose this to public option.
The other point is that putting all the options together is less fragmented, IMO. So you can browse all the options for all the backends in a single file.
Thus, I ended up with having them altogether in this file; it is also easier to skim through all the options because they all locate in a single file. I think I should mention this in PR description. What do you think?
| def testCodegenOptLevelCascade(self): | ||
| # Set global opt level to O2, check cascade to codegen opt levels. | ||
| session = Session() | ||
| session.set_flags("--iree-opt-level=O2") | ||
| flags = session.get_flags() | ||
| self.assertIn("--iree-codegen-opt-level=O2", flags) | ||
| self.assertIn("--iree-llvmcpu-opt-level=O2", flags) | ||
|
|
||
| def testCodegenOptLevelOverride(self): | ||
| # Override just the CPU opt level, verify it doesn't affect others. | ||
| session = Session() | ||
| session.set_flags("--iree-opt-level=O2") | ||
| session.set_flags("--iree-llvmcpu-opt-level=O0") | ||
| flags = session.get_flags() | ||
| self.assertIn("--iree-codegen-opt-level=O2", flags) | ||
| self.assertIn("--iree-llvmcpu-opt-level=O0", flags) |
There was a problem hiding this comment.
The failure here is expected. If you look at testOptFlags() test above, --iree-global-optimization-opt-level and --iree-opt-strip-assertions are never modified. This test doesn't check that the options "cascade" but rather that they are not modified.
There was a problem hiding this comment.
Also, it might be worth it to keep these to ensure that the code here is working correctly.
iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp
Lines 994 to 1001 in bc80992
There was a problem hiding this comment.
I updated the test which makes sure that one of the option is captured by sessions.
| // Codegen options (inherit from global opt level). | ||
| // Note: Registration order matters for inheritance. | ||
| (void)CodegenOptions::FromFlags::get(); | ||
| (void)CPUCodegenOptions::FromFlags::get(); |
There was a problem hiding this comment.
I think we want to handle this in LLVMCPU's plugin registration. It should be possible to use the createUninitializedSession(OptionsBinder &localOptionsBinder) hook to bind the options. Then, we should be able to plumb through the options to the passes here:
iree/compiler/plugins/target/LLVMCPU/LLVMCPUTarget.cpp
Lines 865 to 870 in bc80992
This should be better because this will allow us to set the options per session.
| if (CodegenOptions::FromFlags::get().emitPerformanceWarnings) { | ||
| loadOp.emitWarning("decomposing mismatched encoding load op"); | ||
| } |
There was a problem hiding this comment.
This should be plumbed through as a member on the option & a pass option for the passes that use this pattern. The main reason is that FromFlags::get() doesn't work with sessions. But it also makes it easier to reproduce the warning if it's a pass option.
| // Set all the distribution tile sizes to zero if thread distribution is | ||
| // disabled. | ||
| if (clDisableDistribution) { | ||
| if (CPUCodegenOptions::FromFlags::get().disableDistribution) { |
| LoweringConfigAttrInterface loweringConfig = getRootLoweringConfig(funcOp); | ||
| auto pipeline = translationInfo.getDispatchLoweringPassPipeline(); | ||
| LLVMCPUPipelineOptions pipelineOpts; | ||
| pipelineOpts.cpuOpts = CPUCodegenOptions::FromFlags::get(); |
There was a problem hiding this comment.
This should be a pass option, too. But it might be a bit trickier.
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
|
I tested with a python program that ensures the distribution option is working as expected: |
|
|
||
| // Override Registration to also bind CPUCodegenOptions to the session. | ||
| struct Registration : PluginSession::Registration { | ||
| using PluginSession::Registration::Registration; |
There was a problem hiding this comment.
This using statement is required for
iree/compiler/src/iree/compiler/PluginAPI/Client.h
Lines 267 to 268 in 7edade7
… and MLIR Signed-off-by: hanhanW <[email protected]>
|
@IanWood1 I have a question. Why isn't (I'm mainly looking at this comment about having a master flag, i.e., |
IanWood1
left a comment
There was a problem hiding this comment.
Looks good to me in terms of setting up the Binder/Options. However, I'm not familiar enough with this part of the codebase so I'll hold of on approving. One thing that I just noticed is that there already exists an options struct for LLVMCPU named LLVMCPUTargetCLOptions:
Binder method:
Given this other struct exists, why do we need the new CPUCodegenOptions struct? This isn't blocking. I'm just trying to get an idea of whats going on.
Maybe I'm misunderstanding what you mean by master flag but https://github.com/iree-org/iree/blob/7aa7be5fd3f237f3ef885437f3d1be3f8798dac6/tools/test/compile_flags.mlir demonstrates this behavior. |
How the codegen works in IREE is that we have some lowering/transformation/optimization that takes linalg ops as input and lower them to low-level dialect. On CPU side, they are lowered to LLVM dialect; we conver them to LLVM module and rely on LLVM to do further lowering. Having a new option allows us to set different optimization level on MLIR lowering and LLVM lowering. This is why I'm adding a new option for MLIR part.
I see, thanks for the sharing on lit tests which demonstrates that it is a master flag. I was confused by the iree/compiler/bindings/python/test/api/api_test.py Lines 62 to 66 in 7aa7be5 |
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
hanhanW
left a comment
There was a problem hiding this comment.
I found a "bug" that iree-opt-level does not work as master flag for iree-llvmcpu-mlir-opt-level. Claude helps me the fix: f1e7a09
Thus, we no longer need the change in docs/. The original suggestion includes iree-opt-level in the compile command.
Signed-off-by: hanhanW <[email protected]>
| // Enables reassociation for FP reductions. | ||
| bool reassociateFpReductions = true; |
There was a problem hiding this comment.
I found the issue because this is set wrongly. After thinking a while, should we just drop all the default values from this struct? The real default values are defined in .cpp, IIUC.
Signed-off-by: hanhanW <[email protected]>
Actually, there is a chance to have a single binder. We can declare a field for CPUCodegenOption in the old struct and reuse the binder. Let me give it a shot. |
|
I gave it a shot, and I learned the setup better. If we move the CPUOptions into Thus, if we want to keep CPUOptions controled by global without the weird dependency, we'll need to the registeration separately, which leads to current situation. Summary:
@IanWood1 does it answer your question? |
IanWood1
left a comment
There was a problem hiding this comment.
LGTM on setting up the binder with the plugin.
From the description:
The opt-level fix causes split_reduction_using_tiling.mlir to fail because FP reassociation is now correctly disabled at O0. The test validates that split reduction produces similar results to normal reduction - a relative tolerance (rtol 0.01%) is more appropriate than absolute tolerance since FP rounding error scales with value magnitude.
Do we have any way to test the binder is working correctly when >O0? It would be nice to have a test, but this might be a difficult thing to check.
Signed-off-by: hanhanW <[email protected]>
Good point! Tests are added in 3fb271b. Thanks that we have pipeline tests for split reduction flag. |
MaheshRavishankar
left a comment
There was a problem hiding this comment.
I am approving this change. I highlighted one issue I found here with the use of std::optional but I cant suggest an alternative.
@benvanik if you have thoughts here please let us know.
| // opt-level-dependent defaults are applied before options are copied. The | ||
| // `std::optional` is required because PluginManagerSession does not have | ||
| // default constructor. Using `std::optional` allows deferring construction. | ||
| std::optional<PluginManagerSession> pluginSession; |
There was a problem hiding this comment.
The only issue I have with this PR is this std::optional thing. I cant suggest an alternative though cause I couldnt come up with a "cleaner" way of doing this.
There was a problem hiding this comment.
I gave it another shot and it turns out that we don't need the optional. Because the constructor of pluginSession stores reference for PluginManagerOptions and it doesn't read the values, so the ordering does not matter here. It is safe as long as the values are not run until initializePlugins.
EDIT: I was wrong again, so reverted it back.
| linalg.yield %0 : f32 | ||
| } -> tensor<128xf32> | ||
| check.expect_almost_eq (%normal_reduce, %split_reduction, atol 0.1) : tensor<128xf32> | ||
| check.expect_almost_eq (%normal_reduce, %split_reduction, atol 0.0, rtol 0.0001) : tensor<128xf32> |
There was a problem hiding this comment.
atol of 0? really that is pretty stringent. I guess you turned off reassociation? Its good though if we can keep it this way.
There was a problem hiding this comment.
rtol matters here. The default is zero, which is too small. The values are not small, so atol does not play a big role here.
880b83a to
bb0d837
Compare
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: hanhanW <[email protected]>
391dcc1 to
ab014d9
Compare
The revision adds CPUCodegenOptions and moves few CPU flags to the struct: - iree-llvmcpu-disable-distribution: always default `off`. - iree-llvmcpu-fail-on-out-of-bounds-stack-allocation: always default `on`. - iree-llvmcpu-reassociate-fp-reductions: default `off` for `O0`, `on` for `O2+` - Merged a pass level test to pipeline test because disabling distribution is a global option. Note that `iree-opt-level` is the master flag that also controls `iree-llvmcpu-mlir-opt-level` unless you explicitly specify a value. The opt-level fix causes split_reduction_using_tiling.mlir to fail because FP reassociation is now correctly disabled at O0. The test validates that split reduction produces similar results to normal reduction - a relative tolerance (rtol 0.01%) is more appropriate than absolute tolerance since FP rounding error scales with value magnitude. It is a step towards iree-org#19072 Assisted-by: Claude --------- Signed-off-by: hanhanW <[email protected]>
The revision adds CPUCodegenOptions and moves few CPU flags to the struct:
off.on.offforO0,onforO2+Note that
iree-opt-levelis the master flag that also controlsiree-llvmcpu-mlir-opt-levelunless you explicitly specify a value.The opt-level fix causes split_reduction_using_tiling.mlir to fail because FP reassociation is now correctly disabled at O0. The test validates that split reduction produces similar results to normal reduction - a relative tolerance (rtol 0.01%) is more appropriate than absolute tolerance since FP rounding error scales with value magnitude.
It is a step towards #19072
Assisted-by: Claude