Skip to content

fix logic in make_module_dep w.r.t. excluding loads for toolchain & toolchain components#2140

Merged
boegel merged 6 commits intoeasybuilders:developfrom
boegel:fix_make_module_dep_excl_deps
Mar 6, 2017
Merged

fix logic in make_module_dep w.r.t. excluding loads for toolchain & toolchain components#2140
boegel merged 6 commits intoeasybuilders:developfrom
boegel:fix_make_module_dep_excl_deps

Conversation

@boegel
Copy link
Copy Markdown
Member

@boegel boegel commented Mar 5, 2017

This provides a proper fix for the problem that #2135 dances around, and basically undoes @bartoldeman's PR #2091 since it implements a better solution that also does the right thing when GCC (i.e. the bundle of GCCcore and binutils) is used as a toolchain.

Thanks to @rtmclay for helping out with figuring out the actual problem by looking at the Lmod debug output provided by @akesandgren where reloading the OpenMPI module failed because the binutils module couldn't be found anymore.

The problem was that load statements were still being inserted for GCCcore and binutils when something was being built with the GCC toolchain (e.g. OpenMPI) when a module naming scheme that enables expand_toolchain_load was being used (for example HierarchicalMNS).

The existing code that excludes dependencies that extend $MODULEPATH failed because GCC (i.e. the toolchain being used) was expanded into GCCcore + binutils before the list of dependencies that extend $MODULEPATH are determined via path_to_top_of_module_tree, and hence excluded_deps was an empty list while it should have included GCC.

This is only a problem when the GCC toolchain was being used, since both GCC and its GCCcore component extend $MODULEPATH.

By now passing both the toolchain (e.g. GCC) and the toolchain components (GCCcore + binutils) to path_to_top_of_module_tree, the GCC toolchain itself is correctly excluded as well (and hence so are the toolchain components).

The additional loop that excludes dependencies of dependencies that was added in #2091 by @bartoldeman is no longer needed now.

cc @geimer, @ocaisa, @damianam

@akesandgren
Copy link
Copy Markdown
Contributor

This works like a charm :-)

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Mar 6, 2017

lgtm

boegel added a commit to boegel/easybuild-framework that referenced this pull request Mar 6, 2017
@bartoldeman
Copy link
Copy Markdown
Contributor

This certainly a more elegant solution! lgtm2

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Mar 6, 2017

Thanks for the feedback everyone, going in to be included in EasyBuild v3.1.1.

@boegel boegel merged commit d29fa91 into easybuilders:develop Mar 6, 2017
@boegel boegel deleted the fix_make_module_dep_excl_deps branch March 6, 2017 20:48
@bartoldeman
Copy link
Copy Markdown
Contributor

I did not check properly... This fix is good for the compiler-only toolchain case where the toolchain used equals the MODULEPATH extender (in fact my hook for ec in expand_toolchain_load(ec=self.cfg) was another way to fix this because our module naming scheme uses that to check for compiler-only toolchains, but your fix is much better than this).

However the original problem fixed in #2091 is still there.
I can reproduce in the same way as in #2091 with HPL-2.1-intel-2016b.eb. It now includes again these two superfluous loads:

load("GCCcore/5.4.0")
 
load("binutils/2.26")

@bartoldeman
Copy link
Copy Markdown
Contributor

bartoldeman commented Mar 8, 2017

How does that work: HPL using intel-2016b expands the dependencies from intel-2016b.eb:

dependencies = [
    ('GCCcore', gccver),
    ('binutils', binutilsver, '-GCCcore-%s' % gccver),
    ('icc', compver, gccsuff),
    ('ifort', compver, gccsuff),
    ('impi', '5.1.3.181', '', ('iccifort', '%s%s' % (compver, gccsuff))),
    ('imkl', '11.3.3.210', '', ('iimpi', version)),
]

The modules icc, ifort, and impi are eliminated since they expand $MODULEPATH.
However, GCCcore and binutils, which are dependencies of icc and ifort are no longer eliminated.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Mar 8, 2017

@bartoldeman Hmm, ok, I indeed overlooked that... And since I also changed to test case, the tests didn't complain either.

Can you open a new PR to current develop, basically re-adding the changes you had in #2091?

Do also include test case from #2091 in the existing test_make_module_dep_hmns test, so we retain both test cases (i.e. the one from #2091 and the one I changed it to in this PR).

Do make sure that the original test from #2091 highlights the reintroduced bug; if not, we need a better test case.

@bartoldeman
Copy link
Copy Markdown
Contributor

bartoldeman commented Mar 8, 2017

I am a little worried about reintroducing #2091 because that also means adding #2135 again.
For the intel/2016b this issue can be avoided by simply stripping down intel-2016b.eb:

  dependencies = [
-   ('GCCcore', gccver),
-   ('binutils', binutilsver, '-GCCcore-%s' % gccver),
     ('icc', compver, gccsuff),
    ('ifort', compver, gccsuff),
    ('impi', '5.1.3.181', '', ('iccifort', '%s%s' % (compver, gccsuff))),
    ('imkl', '11.3.3.210', '', ('iimpi', version)),
]

since icc and ifort already depend on those. Now in my case I have a different issue since in our setup it's iccifort that expands $MODULEPATH, instead of icc and ifort.

However if I rewrite deps in intel-2016b.eb as follows:

dependencies = [
    ('iccifort', compver, gccsuff),
    ('impi', '5.1.3.181', '', ('iccifort', '%s%s' % (compver, gccsuff))),
    ('imkl', '11.3.3.210', '', ('iimpi', version)),
]

then I trigger an exception in _verify_toolchain(self) in easybuild/tools/toolchain/toolchain.py.

I think I will try to experiment with increasing depth in the

self.toolchain_dep_mods = dependencies_for(mod_name, self.modules_tool, depth=0)

call in toolchain.py since that way the icc and ifort dependencies are found (because they are deps of iccifort).

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Mar 9, 2017

I am a little worried about reintroducing #2091 because that also means adding #2135 again.

No, it doesn't, #2135 was a workaround for modules that include load GCCcore while they shouldn't, while #2091 avoids inserting the load GCCcore statements in the first place.

So, as long as your modules have been regenerating with the fix included in #2091, you don't need to avoid reloading modules anymore like we do in #2135 (other than maybe to optimise things, but you'll gain very little from it).

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Mar 10, 2017

fix was (re)implemented by @bartoldeman in #2152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants