ARM: GCC optimal/generic architecture compiler flags#1974
ARM: GCC optimal/generic architecture compiler flags#1974boegel merged 21 commits intoeasybuilders:developfrom
Conversation
| systemtools.POWER: 'mcpu=generic-arch', # no support for -march on POWER | ||
| systemtools.AARCH32: 'mcpu=generic-armv7', # implies -march=armv7 and -mtune=generic-armv7 | ||
| systemtools.AARCH64: 'mcpu=generic', # implies -march=armv8-a and -mtune=generic | ||
| systemtools.POWER: 'mcpu=generic-arch', # no support for -march on POWER |
There was a problem hiding this comment.
@JackPerdue Did you ever test this? I could not find any indication in the GCC manuals that generic-arch is a valid cpu_type value for the -mcpu option on POWER...
|
A quick test rebuilding HPL-2.1-foss-2016b.eb with --optarch=GENERIC shows a definite fail. I'm not used to using the GENERIC flag but, yes, GCC won't work on Power as is. |
|
-mcpu=native -mtune=native might be the best way to go for now if the GENERIC needs to be differentiated in some way... but generic-arch is right out (as you suspected) |
|
gcc: note: valid arguments to ‘-mtune=’ are: 401 403 405 405fp 440 440fp 464 464fp 476 476fp 505 601 602 603 603e 604 604e 620 630 740 7400 7450 750 801 821 823 8540 8548 860 970 G3 G4 G5 a2 cell e300c2 e300c3 e500mc e500mc64 e5500 e6500 ec603e native power3 power4 power5 power5+ power6 power6x power7 power8 powerpc powerpc64 rs64 titan I don't have any suggestions beyond those not understanding what "generic" means on Power. |
| # used when 'optarch' toolchain option is enabled (and --optarch is not specified) | ||
| COMPILER_OPTIMAL_ARCHITECTURE_OPTION = { | ||
| systemtools.INTEL : 'march=native', | ||
| systemtools.AMD : 'march=native', |
There was a problem hiding this comment.
oh, wait, this relates to your change in compiler.py where you define self.arch differently...
| # Construct -mcpu option | ||
| option = 'mcpu=%s' % '.'.join([i[1] for i in sorted_core_types]) | ||
|
|
||
| self.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[systemtools.AARCH64] = option |
There was a problem hiding this comment.
@geimer don't we want some kind of warning in case option is an empty string here?
| # big.LITTLE setups: sort core types to have big core (higher model number) first | ||
| core_types = [core.strip().lower() for core in cpu_model.split('+')] | ||
| sorted_core_types = sorted([(int(re.search('\d+', i).group(0)), i) for i in core_types], | ||
| reverse=True) |
There was a problem hiding this comment.
there's lots that could go wrong here, and there's no error handling whatsoever; aren't we making too much assumption here on what get_cpu_model returns?
There was a problem hiding this comment.
@boegel To be honest, I copy/paste-engineered this based on some code snippet I found on the web (StackOverflow, IIRC). I have no clue how to add error handling here...
Regarding your concerns about assumptions: This is certainly an issue, yes. However, I currently see no other way to optimize for the host architecture if -mcpu=native is not supported. If you have any better suggestions, please let me know.
There was a problem hiding this comment.
Well, it's mostly a case of error handling: what if our assumption is wrong? Then we should fall back to not picking a value for option (which we should rename to optarch_opt here?).
The only real issue/assumption you're making here is that i) get_cpu_model() returns something that starts with ARM, and ii) that we can find find a number in every item in core_types.
So, I think this should become something like this:
optarch_opt = ''
cpu_vendor, cpu_model = systemtools.get_cpu_vendor(), systemtools.get_cpu_model()
if cpu_vendor == systemtools.ARM and cpu_model.startswith('ARM '):
self.log.debug("Determining architecture-specific compiler optimization flag for ARM (model: %s)", cpu_model)
core_types = []
for core_type in [ct.strip().lower() for ct in cpu_model[4:].split('+')]:
# determine numeric id for each core type, since we need to sort them later numerically
res = re.search('\d+$', core_type) # note: numeric ID is expected at the end
if res:
core_id = int(res.group(0))
core_types.append((core_id, core_type))
self.log.debug("Extracted numeric ID for ARM core type '%s': %s", core_type, core_id)
else:
# if we can't determine numeric id, bail out
core_types = None
self.log.debug("Failed to extract numeric ID for ARM core type '%s', bailing out", core_type)
break
if core_types:
# example: 'mcpu=cortex-a72.cortex-a53' for "ARM Cortex-A53 + Cortex-A72"
optarch_opt = 'mcpu=%s' '.'.join([ct[1] for ct in sorted(core_types, reverse=True)])
self.log.debug("Using architecture-specific compiler optimization flag '%s'", optarch_opt)
self.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[systemtools.AARCH64] = option| sorted_core_types = sorted([(int(re.search('\d+', i).group(0)), i) for i in core_types], | ||
| reverse=True) | ||
| # Construct -mcpu option | ||
| option = 'mcpu=%s' % '.'.join([i[1] for i in sorted_core_types]) |
There was a problem hiding this comment.
can you provide a couple of examples in a comment of how what value would be specified to mcpu with this approach (just in case we get something entirely different at some point)?
| def set_variables(self): | ||
| """Set the variables""" | ||
| if self.arch is None: | ||
| self.arch = systemtools.get_cpu_architecture() |
There was a problem hiding this comment.
hmm, aren't there any other locations where we rely on the .arch attribute of a Toolchain instance (potentially) that we should worry about?
playing the devil's advocate here, since this approach makes more sense anyway (defining self.arch with the result of get_cpu_family was already an indication that something wasn't 100% right...)
There was a problem hiding this comment.
@boegel As far as I could figure out using grep -r '\.arch' on framework, easyblocks, and easyconfigs, Compiler.arch (it's not defined in the Toolchain base class) is only used in tools/toolchain/compiler.py, in the toolchain unit tests, and now in toolchains/compiler/gcc.py.
BTW: Is there any reason why it isn't set in the constructor already? In this case, lazy initialization doesn't make much sense to me. But I may be overlooking something.
There was a problem hiding this comment.
Not seeing any other use of .arch elsewhere is good, but we don't control all easyblocks out there, and people have scripts around that hook into the EasyBuild framework, who knows...
That shouldn't hold us back from making this change though, since it makes a whole lot of sense, and if anyone relies on .arch somewhere, they probably shouldn't be anyway (they should call the appropriate function from systemtools instead).
I see no reason either for lazy init of self.arch, so feel free to change that.
| else: | ||
| self.assertFalse(generic_flags in tc.get_variable(var)) | ||
|
|
||
| def test_optarch_aarch64_heuristic(self): |
There was a problem hiding this comment.
@geimer don't we want to test the default behaviour for AARCH64 as well here?
you can put a dummy module in place for GCC 6.x to check that
There was a problem hiding this comment.
@boegel This is causing some trouble I don't understand... I've added a module file for GCCcore 6.2.0 and added
tc = self.get_toolchain("GCCcore", version="6.2.0")
tc.prepare()
self.assertEqual(...)
at the end of this test. But it is still giving me the modified mcpu flag based on the heuristic, even though it is definitely not taking the "modify option with heuristic" branch in _set_optimal_architecture. Why don't I get a fresh instance with COMPILER_OPTIMAL_ARCHITECTURE_OPTION initialized to the defaults???
There was a problem hiding this comment.
@geimer Hmm, you should get a fresh instance... Do make sure you also call tc.set_options() before tc.prepare() though, even if you're not setting any custom toolchain options. Maybe that explains it?
There was a problem hiding this comment.
@boegel Nope. Here is my current test (with a GCCcore/6.2.0 module in place):
def test_optarch_aarch64_heuristic(self):
"""Test whether AArch64 pre-GCC-6 optimal architecture flag heuristic works."""
st.get_cpu_architecture = lambda: st.AARCH64
st.get_cpu_model = lambda: 'ARM Cortex-A53'
st.get_cpu_vendor = lambda: st.ARM
tc = self.get_toolchain("GCC", version="4.7.2")
tc.prepare()
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=cortex-a53')
st.get_cpu_model = lambda: 'ARM Cortex-A53 + Cortex-A72'
tc = self.get_toolchain("GCC", version="4.7.2")
tc.prepare()
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=cortex-a72.cortex-a53')
tc = self.get_toolchain("GCCcore", version="6.2.0")
tc.prepare()
self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=native')
I've added
print(self.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[systemtools.AARCH64])
to _set_optimal_architecture in easybuild-framework/toolchains/compiler/gcc.py right at the beginning after the check that we're on AArch64 (i.e., before modifying the default value). And it prints
mcpu=native
mcpu=cortex-a53
mcpu=cortex-a72.cortex-a53
Calling tc.set_options({}) right before tc.prepare() in the test doesn't change anything. So it keeps the modified value even when creating a new instance. What am I doing wrong here? I'm confused...
There was a problem hiding this comment.
@geimer Taking a look at this right now, I'll keep you posted.
There was a problem hiding this comment.
for completeness sake: this problem was fixed in geimer#12
| COMPILER_OPTIMAL_ARCHITECTURE_OPTION = { | ||
| systemtools.INTEL : 'xHost', | ||
| systemtools.AMD : 'xHost', | ||
| systemtools.X86_64: 'xHost', |
There was a problem hiding this comment.
@geimer hmm, I'm not so sure we want to drop the distinction between AMD and INTEL here... right now, it doesn't matter, but it may sometime in the future?
There was a problem hiding this comment.
@boegel Hmm... for ARM, one needs to distinguish by architecture (as both AArch32 and AArch64 are of CPU family ARM), for X64_64 you want a distinction by CPU family. I can see various ways to address this:
- Use the architecture as dictionary key (like now) and overwrite the value in
_set_optimal_architectureif needed - Do it exactly the other way around (family as key, overwrite on architecture)
- Use a tuple
(architecture, family)as dictionary key
If you really want to keep the family distinction, I'm leaning towards the last option. What do you think?
There was a problem hiding this comment.
Going with the latter makes sense to me!
|
@geimer Looks like this will have to be moved to the |
|
@boegel I still have no clue why this unit test is failing. I expect to get a fresh |
…dating class constant
|
Again... I don't know who added mcpu=generic-arch for POWER or why and I have no idea what GENERIC is supposed to be but --optarch=GENERIC on Power7 is broken in 3.0.... there is no "generic-arch" option for mcpu (see above). |
|
@JackPerdue yeah, the plan is to fix that in this PR as well, since we're fiddling in that area. Not quite sure what We could just spit out an error stating this is not supported on POWER. |
pass down default optarch value for AAarch64 & GCC 6.x rather than updating class constant
|
@JackPerdue, @boegel: Looking through the GCC manuals of various versions, these may be reasonable values for POWER:
As far as I can see, |
| tc = self.get_toolchain("GCCcore", version="6.2.0") | ||
| tc.set_options({}) | ||
| tc.prepare() | ||
| self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[tc.arch], 'mcpu=native') |
There was a problem hiding this comment.
Is it OK to compare with tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION here? In the other cases we compare against tc.options.options_map['optarch'].
| tc = self.get_toolchain("GCCcore", version="6.2.0") | ||
| tc.set_options({}) | ||
| tc.prepare() | ||
| self.assertEqual(tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION[(tc.arch, tc.cpu_family)], 'mcpu=native') |
There was a problem hiding this comment.
Hmm... One should refresh the page before commenting ;-) So, again:
Is it OK to compare with tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION here? In the other cases we compare against tc.options.options_map['optarch'].
|
@boegel The failing test seems to be due to an inconsistent behavior of TCL modules. Any suggestions how to deal with this? Edit: Maybe run |
(currently not required due to the way it is implemented, but may cause subtle errors in the future if the implementation changes)
rather than tc.COMPILER_OPTIMAL_ARCHITECTURE_OPTION
|
@boegel I believe this PR is ready for another review. However, there is still three issues:
|
…vironmentModulesC modules tools
fix broken test_avail, GCCcore is only included in result for Lmod/EnvironmentModulesC modules tools
| (systemtools.AARCH64, systemtools.ARM): 'mcpu=native', # since GCC 6; implies -march=native and -mtune=native | ||
| (systemtools.POWER, systemtools.POWER): 'mcpu=native', # no support for -march on POWER; implies -mtune=native | ||
| (systemtools.X86_64, systemtools.AMD): 'march=native', # implies -mtune=native | ||
| (systemtools.X86_64, systemtools.INTEL): 'march=native', # implies -mtune=native |
There was a problem hiding this comment.
@geimer this makes me wonder, should we also be using -mcpu=native on x86_84, since that implies both -march=native and -mtune=native for AARCH32?
Which one is closer to Intel's -xHost, mcpu=native or -march=native?
There was a problem hiding this comment.
@boegel: These kind of options are a mess in GCC, as they are inconsistent between architectures... -march=native is the way to go on x86_64.
On x86_64, -march=native implies -mtune=native, but -mcpu is a deprecated synonym for -mtune. So using -mcpu=native would only tune for the host, but not enable all CPU features (as -march does).
There was a problem hiding this comment.
@geimer That sounds like a mess indeed, thanks for clarifying!
…t_optimal_architecture in gcc.py
| # On AArch64, -mcpu=native is not supported prior to GCC 6. In this case, try to optimize for the detected | ||
| # CPU model (vanilla ARM cores only). Note that this heuristic may fail if the CPU model is not supported | ||
| # by the GCC version being used. | ||
| gcc_version = self.get_software_version(self.COMPILER_MODULE_NAME)[0] |
There was a problem hiding this comment.
@geimer: did you ever test this with a GCC module that is actually a bundle of GCCcore and binutils?
As far as I can tell, grabbing the GCC version will fail in that case, as least with this current implementation (which only takes into account GCC as module name)...
There was a problem hiding this comment.
Ah, I see why this worked: we usually make sure in the GCC bundles that $EBROOTGCCCORE and $EBVERSIONGCCCORE are also defined for the GCCcore module, by defining altroot/altversion in the GCC bundle easyconfig, see https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/g/GCC/GCC-6.2.0-2.27.eb#L21
However, it would be better to take both GCCcore (first) and GCC (second)into account for determining the GCC version, like we already do for the installation prefix in_set_compiler_vars`.
Are you up for making that change @geimer?
edit: $EBROOTGCC and $EBVERSIONGCC => $EBROOTGCCCORE and $EBVERSIONGCCCORE
There was a problem hiding this comment.
@boegel: I have to admit that I don't remember whether I tested this with a GCC bundle. But...
As far as I can see, self.COMPILER_MODULE_NAME corresponds to GCC when using the GCC toolchain, and to GCCcore when using this toolchain. So it should actually do the right thing, no?
There was a problem hiding this comment.
Well, no, since when GCC is being used as a toolchain, the compiler is actually provided via GCCcore, so we should grab the version directly from there (without relying on a GCC bundle also defining $EBROOTGCCCORE...
There was a problem hiding this comment.
@boegel Just to clarify: You are proposing to use get_software_version from easybuild.tools.modules here rather than self.get_software_version, right? I don't think I understand the difference right now (it's been a long day...)
There was a problem hiding this comment.
No, I propose to try obtaining the version using GCCcore first, and fall back to use GCC if that doesn't work. Whether you use self.get_software_version or get_software_version doesn't really matter here, that's only relevant when multiple modules are in play (self.get_software_version has a way of dealing with that).
So, something like this:
gcc_ver = get_software_version('GCCcore')
if gcc_ver is None:
gcc_ver = get_software_version('GCC')(switching to get_software_version makes sense here since we know we'll never be dealing with multiple names at the same time here)
| cpu_vendor = systemtools.get_cpu_vendor() | ||
| cpu_model = systemtools.get_cpu_model() | ||
| if cpu_vendor == systemtools.ARM and cpu_model.startswith('ARM '): | ||
| self.log.debug("Determining architecture-specific compiler optimization flag for ARM (model: %s)", |
| # On AArch64, -mcpu=native is not supported prior to GCC 6. In this case, try to optimize for the detected | ||
| # CPU model (vanilla ARM cores only). Note that this heuristic may fail if the CPU model is not supported | ||
| # by the GCC version being used. | ||
| gcc_version = self.get_software_version(self.COMPILER_MODULE_NAME)[0] |
There was a problem hiding this comment.
Ah, I see why this worked: we usually make sure in the GCC bundles that $EBROOTGCCCORE and $EBVERSIONGCCCORE are also defined for the GCCcore module, by defining altroot/altversion in the GCC bundle easyconfig, see https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/g/GCC/GCC-6.2.0-2.27.eb#L21
However, it would be better to take both GCCcore (first) and GCC (second)into account for determining the GCC version, like we already do for the installation prefix in_set_compiler_vars`.
Are you up for making that change @geimer?
edit: $EBROOTGCC and $EBVERSIONGCC => $EBROOTGCCCORE and $EBVERSIONGCCCORE
flesh out _guess_arm_aarch64_default_optarch method to lighten up _set_optimal_architecture in gcc.py
|
@geimer I don't mind at all, I only really care about the test result of the last commit; but it's good to know Travis screws up if you push multiple commits quickly enough. Going in, thanks! |
This PR adds support for optimal/generic architecture compiler flags for the GCC compiler. This required basing the flag selection on the CPU architecture rather than the CPU family.
AArch32: The generic option assumes an ARMv7 platform. Using EasyBuild on older ARM architectures probably isn't of interest anyhow.
AArch64: Since the
-mcpu=nativeoption is only supported since GCC 6, a simple heuristic to target the detected CPU model for vanilla ARM cores has been implemented for older compilers. This is a "best effort" approach that may fail if the actual compiler does not (yet) support the corresponding CPU model. However, in this case the user still has the option to manually specify a reasonable option via--optarch=....