Make the scalapack easyblock capable of building in parallel.#1288
Conversation
Just make the library first in parallel, delete it and restart the build in serial.
|
@akesandgren Can you clarify why you need to remove the library after building it in parallel? Your comments mentions it's broken after building in parallel, but I'm not sure what you mean there. Also, it this really worth the trouble, does it significantly speed up the build process? |
|
When building in parallel there is no guard on the actual $(AR) libscalapack.a xxx.o commands in the various sub parts of scvalapack, i.e., blacs, pblas, scalapack. So libscalapack.a gets garbled. And yes, it reduces build time with up to 80% on my dual-core laptop, and when using intel compiler the reduced build time is significant. |
|
Tested with one of the existing The speedup of the build process is quite significant too; with Requires changes to the ScaLAPACK easyconfigs (i.e. dropping the hardcoded Thanks a lot for the effort on this @akesandgren, I'm sure figuring this out wasn't as easy as it looks. :) |
|
Oh but it was easy, I've known this for years :-) |
| self.cfg.update('buildopts', 'lib') | ||
| self.cfg.update('buildopts', ' '.join(extra_makeopts)) | ||
|
|
||
| super(EB_ScaLAPACK, self).build_step() |
There was a problem hiding this comment.
@akesandgren Every now and then, a ScaLAPACK build fails here with ranlib: ../../../libscalapack.a: Malformed archive, see test report in easybuilders/easybuild-easyconfigs#5331.
To remedy this, we should run the parallel make such that we just ignore the exit code, rather than just calling out to build_step of the parent here?
There was a problem hiding this comment.
Ahh, I was looking at the failed reports but couldn't see any reason for the fail.
Ignoring the exit code might be bad in case some compile actually fails (in the future).
But one could patch the makefiles to have the ar/ranlib be ignored. And if so i should probably be done in the easyblock.
Thoughts on that?
There was a problem hiding this comment.
Well, even if the parallel make fails for a different reason, that's OK, since we'll run a make -j 1 anyway (in which case it'll fail again)?
There was a problem hiding this comment.
Yes, that's true. So we could indeed ignore the first parallel make. How?
There was a problem hiding this comment.
rather than doing the super call, you'll need to run make yourself, as follows:
# deliberately ignore exit code using log_ok=False
# creating libscalapack.a may fail in parallel, but should work fine with non-parallel make afterwards
run_cmd("make -j %s" % self.cfg['parallel'], log_ok=True)Note: this ignores prebuidopts & buildopts, if you want to be able to control the make command via the easyconfig, you should make this a bit smarter, see the ConfigureMake.build_step implementation.
Strictly speaking, not doing so would be a regression...
There was a problem hiding this comment.
:-( I was hoping there would be a ignore_fails flag to that build step ...
Oh well, I'll see what i can come up with...
There was a problem hiding this comment.
Well, you can enhance the existing ConfigureMake.build_step implementation to allow ignoring failures of the make command it runs, optionally of course.
It's a pretty specific use case though, you usually really want to check the exit code. :)
Copied code from ConfigureMake easyblock.
|
@boegel Changes done. I hope i got it right. Your comment above on how to do it is self-contradictory. |
|
looks good, retested with all existing ScaLAPACK easyconfigs with the Thanks for the effort @akesandgren! |
Just make the library first in parallel, delete it and restart the build
in serial.