Skip to content

add upstream patch for GCC 9.x, 10.x, 11.x to avoid spurious FPE on avx512 (affects UCX)#13628

Merged
boegel merged 1 commit intoeasybuilders:developfrom
bartoldeman:20210805155044_new_pr_GCCcore1010
Aug 6, 2021
Merged

add upstream patch for GCC 9.x, 10.x, 11.x to avoid spurious FPE on avx512 (affects UCX)#13628
boegel merged 1 commit intoeasybuilders:developfrom
bartoldeman:20210805155044_new_pr_GCCcore1010

Conversation

@bartoldeman
Copy link
Copy Markdown
Contributor

(created using eb --new-pr)

@branfosj branfosj added this to the next release (4.4.2?) milestone Aug 5, 2021
@boegel boegel changed the title Add upstream GCC patch to avoid spurious FPE on avx512 (affects UCX) add upstream patch for GCC 9.x, 10.x, 11.x to avoid spurious FPE on avx512 (affects UCX) Aug 5, 2021
@bartoldeman
Copy link
Copy Markdown
Contributor Author

Test report by @bartoldeman
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
build-node.computecanada.ca - Linux centos linux 7.9.2009, x86_64, Intel Xeon Processor (Skylake, IBRS), Python 3.7.7
See https://gist.github.com/15d8aa863dd891939d63b68fe6a65956 for a full test report.

@bartoldeman
Copy link
Copy Markdown
Contributor Author

I can also confirm this fixes the UCX-with-avx512 issue for us.

@bartoldeman
Copy link
Copy Markdown
Contributor Author

Here's how to see the issue with UCX:

Fortran code:

  program main
  use mpi
  implicit none
  integer ierr,iproc,imol
  call MPI_INIT(ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD,iproc,ierr)
  write(*,*) 'iproc', iproc
  if (iproc == 0 ) then
     imol = 1
  end if
  call MPI_BCAST(imol, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr)
  write(*,*) 'iproc,imol',iproc,imol
  call MPI_FINALIZE(ierr)
  end

compile with (foss toolchain), not no arch option necessary here, issue is not in the example:
mpifort -ffpe-trap=invalid mpibcast.f90 -o mpibcast
run with
srun --nodes=2 --ntasks-per-node=4 --time=0:05:00 ./mpibcast
should get you something like:

 iproc           1
 iproc           5
 iproc           0
 iproc           2
 iproc           3
[blg8429:99624:0:99624] Caught signal 8 (Floating point exception: floating-point invalid operation)
[blg8429:99625:0:99625] Caught signal 8 (Floating point exception: floating-point invalid operation)
[blg8429:99626:0:99626] Caught signal 8 (Floating point exception: floating-point invalid operation)
[blg8429:99627:0:99627] Caught signal 8 (Floating point exception: floating-point invalid operation)
 iproc           4
 iproc           6
 iproc           7
==== backtrace (tid:  99624) ====
 0 0x000000000002078e ucs_debug_print_backtrace()  /tmp/ebuser/avx512/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/ucs/debug/debug.c:653
 1 0x00000000000130f0 __funlockfile()  :0
 2 0x000000000001ba9b ucp_ep_config_get_zcopy_auto_thresh()  /tmp/ebuser/avx512/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/ucp/core/ucp_ep.c:1953
...

@branfosj
Copy link
Copy Markdown
Member

branfosj commented Aug 5, 2021

Test report by @branfosj
SUCCESS
Build succeeded for 9 out of 9 (9 easyconfigs in total)
bear-pg0211u03a.bear.cluster - Linux RHEL 8.3, x86_64, Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (cascadelake), Python 3.6.8
See https://gist.github.com/9be97944d126c9e88a6262110468c0e4 for a full test report.

Copy link
Copy Markdown
Member

@branfosj branfosj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Aug 5, 2021

Test report by @Micket
SUCCESS
Build succeeded for 11 out of 11 (9 easyconfigs in total)
alvis-c1 - Linux centos linux 7.9.2009, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/1ae2d598201ec82f9c8d22d8a91392ad for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Aug 6, 2021

Test report by @boegel
SUCCESS
Build succeeded for 9 out of 9 (9 easyconfigs in total)
node2625.swalot.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/03c60a468a51604ae12e2ec61472d630 for a full test report.

@SebastianAchilles
Copy link
Copy Markdown
Member

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 9 out of 9 (9 easyconfigs in total)
rocky8-eb - Linux rocky linux 8.4, x86_64, Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz (broadwell), Python 3.6.8
See https://gist.github.com/e10c42550afc4caa402bc2a8c9b05cc4 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Aug 6, 2021

Going in, thanks @bartoldeman!

@boegel boegel merged commit b8b5209 into easybuilders:develop Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants