Skip to content

Make the scalapack easyblock capable of building in parallel.#1288

Merged
boegel merged 2 commits intoeasybuilders:developfrom
akesandgren:make-scalapack-build-in-parallel
Dec 7, 2017
Merged

Make the scalapack easyblock capable of building in parallel.#1288
boegel merged 2 commits intoeasybuilders:developfrom
akesandgren:make-scalapack-build-in-parallel

Conversation

@akesandgren
Copy link
Copy Markdown
Contributor

Just make the library first in parallel, delete it and restart the build
in serial.

Just make the library first in parallel, delete it and restart the build
in serial.
@boegel
Copy link
Copy Markdown
Member

boegel commented Nov 11, 2017

@akesandgren Can you clarify why you need to remove the library after building it in parallel? Your comments mentions it's broken after building in parallel, but I'm not sure what you mean there.

Also, it this really worth the trouble, does it significantly speed up the build process?

@boegel boegel added this to the 3.5.0 milestone Nov 11, 2017
@akesandgren
Copy link
Copy Markdown
Contributor Author

When building in parallel there is no guard on the actual $(AR) libscalapack.a xxx.o commands in the various sub parts of scvalapack, i.e., blacs, pblas, scalapack. So libscalapack.a gets garbled.
So, removing it and redoing the make command serially makes it create libscalapack.a correctly. The executables can then be linked to a correctly built libscalapack.

And yes, it reduces build time with up to 80% on my dual-core laptop, and when using intel compiler the reduced build time is significant.

Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Copy Markdown
Member

boegel commented Nov 11, 2017

Tested with one of the existing ScaLAPACK 2.0.2 easyconfigs, works fine.

The speedup of the build process is quite significant too; with ScaLAPACK-2.0.2-gompi-2017b-OpenBLAS-0.2.20.eb I'm seeing the total required time to install it go down from ~6m30s to ~3m20s on a 16-core Intel Sandy Bridge system.

Requires changes to the ScaLAPACK easyconfigs (i.e. dropping the hardcoded parallel = 1) is done in easybuilders/easybuild-easyconfigs#5331.

Thanks a lot for the effort on this @akesandgren, I'm sure figuring this out wasn't as easy as it looks. :)

@akesandgren
Copy link
Copy Markdown
Contributor Author

Oh but it was easy, I've known this for years :-)

Comment thread easybuild/easyblocks/s/scalapack.py Outdated
self.cfg.update('buildopts', 'lib')
self.cfg.update('buildopts', ' '.join(extra_makeopts))

super(EB_ScaLAPACK, self).build_step()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akesandgren Every now and then, a ScaLAPACK build fails here with ranlib: ../../../libscalapack.a: Malformed archive, see test report in easybuilders/easybuild-easyconfigs#5331.

To remedy this, we should run the parallel make such that we just ignore the exit code, rather than just calling out to build_step of the parent here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I was looking at the failed reports but couldn't see any reason for the fail.
Ignoring the exit code might be bad in case some compile actually fails (in the future).
But one could patch the makefiles to have the ar/ranlib be ignored. And if so i should probably be done in the easyblock.

Thoughts on that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, even if the parallel make fails for a different reason, that's OK, since we'll run a make -j 1 anyway (in which case it'll fail again)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. So we could indeed ignore the first parallel make. How?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than doing the super call, you'll need to run make yourself, as follows:

# deliberately ignore exit code using log_ok=False
# creating libscalapack.a may fail in parallel, but should work fine with non-parallel make afterwards
run_cmd("make -j %s" % self.cfg['parallel'], log_ok=True)

Note: this ignores prebuidopts & buildopts, if you want to be able to control the make command via the easyconfig, you should make this a bit smarter, see the ConfigureMake.build_step implementation.

Strictly speaking, not doing so would be a regression...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-( I was hoping there would be a ignore_fails flag to that build step ...
Oh well, I'll see what i can come up with...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can enhance the existing ConfigureMake.build_step implementation to allow ignoring failures of the make command it runs, optionally of course.

It's a pretty specific use case though, you usually really want to check the exit code. :)

@akesandgren
Copy link
Copy Markdown
Contributor Author

@boegel Changes done. I hope i got it right. Your comment above on how to do it is self-contradictory.

@boegel
Copy link
Copy Markdown
Member

boegel commented Dec 7, 2017

looks good, retested with all existing ScaLAPACK easyconfigs with the parallel = 1 removed (see easybuilders/easybuild-easyconfigs#5331), works great, so going in.

Thanks for the effort @akesandgren!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants