Skip to content

switch native flag from -march=native to -mcpu=native for LLVM compilers on Arm 64-bit (aarch64)#5139

Merged
boegel merged 3 commits intoeasybuilders:developfrom
Thyre:llvm-toolchain-aarch64-native
Mar 26, 2026
Merged

switch native flag from -march=native to -mcpu=native for LLVM compilers on Arm 64-bit (aarch64)#5139
boegel merged 3 commits intoeasybuilders:developfrom
Thyre:llvm-toolchain-aarch64-native

Conversation

@Thyre
Copy link
Copy Markdown
Collaborator

@Thyre Thyre commented Feb 25, 2026

It was observed that -march=native does not actually provide all the CPU features on e.g. GH200, missing bf16 when building OpenBLAS 0.3.31 with LLVM 21.1.8. This causes build failures like:

../kernel/arm64/sbgemv_n_neon.c:146:14: error: '__builtin_neon_vld1q_bf16' needs target feature bf16,neon
  146 |         a3 = vld1q_bf16(a_ptr3);
      |              ^

Switching to -mtune=native -mcpu=native solves this issue. Therefore, apply this option generally.

…tive for aarch64

It was observed that `-march=native` does not actually provide all the CPU
features on e.g. GH200, missing bf16 when building OpenBLAS 0.3.31 with
LLVM 21.1.8. This causes build failures like:

../kernel/arm64/sbgemv_n_neon.c:146:14: error: '__builtin_neon_vld1q_bf16' needs target feature bf16,neon
  146 |         a3 = vld1q_bf16(a_ptr3);
      |              ^

Switching to -mtune=native -mcpu=native solves this issue. Therefore, apply
this option generally.

Signed-off-by: Jan André Reuter <[email protected]>
@Thyre Thyre added bug fix aarch64 Related to Arm 64-bit (aarch64) labels Feb 25, 2026
@Thyre Thyre added this to the next release (5.2.2?) milestone Feb 25, 2026
@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Feb 25, 2026

Failure was encountered in easybuilders/easybuild-easyconfigs#25420 (comment).

Trying a rebuild with this PR.

@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Feb 25, 2026

Test report for PR which caused the issue: easybuilders/easybuild-easyconfigs#25420 (comment)

Will run a few more tests to make sure we don't hit any regression because of the change.

@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Feb 25, 2026

Comment thread easybuild/toolchains/compiler/llvm_compilers.py Outdated
@boegel
Copy link
Copy Markdown
Member

boegel commented Feb 25, 2026

Can we ask OpenBLAS maintainers for input on this?

@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Feb 25, 2026

Can we ask OpenBLAS maintainers for input on this?

Sure. I'd guess that the changes described in OpenMathLib/OpenBLAS#5643 (comment) may have something to do with the new error, but generally, -march=native is simply missing bf16 in the passed feature flags, which shouldn't be the case.

@green-br
Copy link
Copy Markdown

Just came across this issue - https://developer.arm.com/community/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu is usually a good reference when it comes with march/mtune/mcpu.

I think (and happy to be corrected) for that instruction vld1q_bf16 is optional in armv9 - so march would have to disable it to make it executable on all armv9 microarchitectures. I usually refer to https://developer.arm.com/documentation/109697/2025_12/Feature-descriptions/The-Armv9-2-architecture-extension?lang=en to see what is optional vs mandatory but never easy to understand. I previously hit similar issues and asked on Arm HPC User Group (AHUG) support channels.

@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Feb 25, 2026

Just came across this issue - https://developer.arm.com/community/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu is usually a good reference when it comes with march/mtune/mcpu.

I think (and happy to be corrected) for that instruction vld1q_bf16 is optional in armv9 - so march would have to disable it to make it executable on all armv9 microarchitectures. I usually refer to https://developer.arm.com/documentation/109697/2025_12/Feature-descriptions/The-Armv9-2-architecture-extension?lang=en to see what is optional vs mandatory but never easy to understand. I previously hit similar issues and asked on Arm HPC User Group (AHUG) support channels.

This is a very good reference, thanks a lot!
It looks like -mcpu=native yields the largest feature set. Interestingly, -mcpu=native does not include some features present with -march=native:

[reuter1@jrc0900 ~]$ diff -Naur mcpu.txt march.txt
--- mcpu.txt    2026-02-25 21:03:44.998971916 +0100
+++ march.txt   2026-02-25 21:03:50.711072338 +0100
@@ -1,39 +1,26 @@
 --
-"+aes"
-"+bf16"
+"+bti"
 "+ccidx"
 "+complxnum"
 "+crc"
+"+dit"
 "+dotprod"
-"+ete"
+"+flagm"
 "-fmv"
-"+fp16fml"
-"+fpac"
 "+fp-armv8"
 "+fullfp16"
-"+i8mm"
 "+jsconv"
 "+lse"
-"+mte"
 "+neon"
 "+outline-atomics"
 "+pauth"
-"+perfmon"
-"-rand"
+"+predres"
 "+ras"
 "+rcpc"
 "+rdm"
-"+sha2"
-"+sha3"
-"+sm4"
-"+spe"
+"+sb"
 "+ssbs"
 "+sve"
 "+sve2"
-"+sve-aes"
-"+sve-bitperm"
-"+sve-sha3"
-"+sve-sm4"
 "-target-feature"
-"+trbe"
 "+v9a"

Some of the features are reported as present when checking /proc/cpuinfo, so no idea why LLVM 22.1.0 does't use them with -mcpu=native. I'd guess they may already be included in "+v9a"?

For now, I'd just go with the TL;DR:

TL;DR: Whenever possible use only -mcpu=native. Avoid -march and -mtune.

Signed-off-by: Jan André Reuter <[email protected]>
@Thyre Thyre changed the title LLVM: Switch native flag from -march=native to -mtune=native -mcpu=native for aarch64 LLVM: Switch native flag from -march=native to -mcpu=native for aarch64 Feb 25, 2026
Comment thread easybuild/toolchains/compiler/llvm_compilers.py Outdated
@boegel boegel changed the title LLVM: Switch native flag from -march=native to -mcpu=native for aarch64 switch native flag from -march=native to -mcpu=native for LLVM compilers on Arm 64-bit (aarch64) Mar 25, 2026
Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem doesn't occur with the existing OpenBLAS-0.3.30-llvm-compilers-20.1.8.eb, but does occur with a copy that's updated to install OpenBLAS v0.3.31 instead.

I've confirmed that using -mcpu=native as proposed here fixes the problem, and indeed https://developer.arm.com/community/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu makes it pretty clear that this is what we should be doing (and we're already doing so with GCC already, see also https://github.com/easybuilders/easybuild-framework/pull/1974/changes).

So, lgtm!

@boegel boegel merged commit c295c14 into easybuilders:develop Mar 26, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aarch64 Related to Arm 64-bit (aarch64) bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants