Skip to content

enhance custom easyblock for GCC to use with-arch option for nvptx with 13.1+#3396

Merged
boegel merged 1 commit intoeasybuilders:developfrom
Thyre:gcc-use-witharch-build-option
Sep 11, 2024
Merged

enhance custom easyblock for GCC to use with-arch option for nvptx with 13.1+#3396
boegel merged 1 commit intoeasybuilders:developfrom
Thyre:gcc-use-witharch-build-option

Conversation

@Thyre
Copy link
Copy Markdown
Collaborator

@Thyre Thyre commented Jul 26, 2024

Motivation

GCC, like other compilers, allows users to use offloading via OpenMP & OpenACC for example to utilize accelerators in their written programs. While some compilers require the presence of CUDA for this e.g. Clang, GCC has no requirement for it to simply build and run an executable containing offloading code.

By default, GCC targets a very low architecture for NVIDIA GPUs though. In GCC 12.3.0, this was sm_30. In GCC 13.3.0, the default version is still the same, but recent nvptx-tools can bump this to sm_50 when CUDA is detected. With this, GCC can work around the removal of sm_3x in more recent CUDA versions, avoiding the following error message:

GCC 12.3.0

$ gcc -fopenmp -foffload=nvptx-none test.c
ptxas fatal   : Value 'sm_35' is not defined for option 'gpu-name'
nvptx-as: ptxas returned 255 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /p/software/fs/jurecadc/stages/2024/software/GCCcore/12.3.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.3.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/p/software/jurecadc/stages/2024/software/binutils/2.40-GCCcore-12.3.0/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

GCC 13.3.0

$ gcc --verbose -fopenmp -foffload=nvptx-none test.c
[...]
/p/software/fs/jurecadc/stages/2025/software/GCCcore/13.3.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.3.0/accel/nvptx-none/lto1 -quiet -dumpbase ./a.xnvptx-none.mkoffload -m64 -mgomp -misa=sm_30 -version -fno-openacc -fno-pie -fcf-protection=none -foffload-abi=lp64 -fopenmp @/tmp/ccZrKG87 -o /tmp/ccbOTp8K.s
[...]
Verifying sm_30 code with sm_50 code generation.
 ptxas -c -o /dev/null /tmp/cc7PheNR.o --gpu-name sm_50 -O0
[...]

However, this may break once again as soon as NVIDIA decides to remove the already deprecated support for sm_50 (in CUDA 11.0). Fortunately, GCC has added a configure option to overwrite the default nvptx architecture. Beginning with GCC 13.1.0, one can pass --with-arch=sm_[x] to set the default option, as long as GCC can understand it.

In addition, choosing a newer architecture by default might bring performance improvements and access to additional features.

Scope of this PR

This pull request adds the new option --with-arch=sm_[x] to GCC builds starting with GCC 13.1.0 if offloading support via nvptx is enabled. To choose which architecture is being passed, a new function named map_nvptx_capability is implemented. This function retrieves cuda_compute_capabilities and matches them against the official GCC mappings (which can be found in ${GCC_SRC}/gcc/config/nvptx/nvptx.opt) being used for the -march-map= argument.

Since GCC only allows to set a single default architecture, I decided to use the lowest one available. For example, JURECA-DC sets both 7.5 and 8.0 for EasyBuild. Therefore, 7.5 would be chosen.
If parsing the architecture mappings fails, for example because the file layout changed or the file was moved, a warning is returned. In this case, we stick to the default of GCC. This is also the case if the architectures in cuda_compute_capabilities cannot be mapped at all. This makes the additions more resilient to upstream changes.

Generally, this helps users as they are not required to pass architectures manually every single time as it is the case with CUDA 12 + GCC 12.3.0 right now. Here, one would need to pass -foffload-options=-misa=sm_80.

@Thyre Thyre force-pushed the gcc-use-witharch-build-option branch from c4c4e22 to b7308d2 Compare July 26, 2024 20:05
@SebastianAchilles SebastianAchilles added this to the release after 4.9.2 milestone Jul 30, 2024
@SebastianAchilles
Copy link
Copy Markdown
Member

Test report by @SebastianAchilles

Overview of tested easyconfigs (in order)

  • SUCCESS GCCcore-12.3.0.eb
  • SUCCESS GCCcore-13.1.0.eb
  • SUCCESS GCCcore-13.2.0.eb
  • SUCCESS GCCcore-13.3.0.eb
  • SUCCESS GCCcore-14.1.0.eb

Build succeeded for 5 out of 5 (5 easyconfigs in total)
jscclxc1.int.jsc-clx.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, Intel Xeon Processor (Cascadelake) (cascadelake), Python 3.9.18
See https://gist.github.com/SebastianAchilles/6890d9cc1f0024fe7541e064ba5009f8 for a full test report.

Comment thread easybuild/easyblocks/g/gcc.py Outdated
Comment thread easybuild/easyblocks/g/gcc.py Outdated
Comment thread easybuild/easyblocks/g/gcc.py Outdated
Comment thread easybuild/easyblocks/g/gcc.py Outdated
Comment thread easybuild/easyblocks/g/gcc.py Outdated
Comment thread easybuild/easyblocks/g/gcc.py Outdated
Comment thread easybuild/easyblocks/g/gcc.py Outdated
@boegel boegel changed the title GCC: Use with-arch option for nvptx with 13.1+ enhance custom easyblock for GCC to use with-arch option for nvptx with 13.1+ Jul 31, 2024
@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Aug 1, 2024

Thanks a lot for the review. I agree with your comments and am working on adding them to the PR.

@Thyre Thyre force-pushed the gcc-use-witharch-build-option branch 2 times, most recently from 2756041 to 000c666 Compare August 1, 2024 10:44
@Thyre Thyre force-pushed the gcc-use-witharch-build-option branch from 000c666 to 5a86bb8 Compare August 14, 2024 07:51
@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Aug 14, 2024

Fixed the failed test workflow: https://github.com/easybuilders/easybuild-easyblocks/actions/runs/10196832799
I missed one f-string.

@Thyre Thyre requested a review from boegel August 14, 2024 08:33
@SebastianAchilles
Copy link
Copy Markdown
Member

Test report by @SebastianAchilles

Overview of tested easyconfigs (in order)

  • SUCCESS GCCcore-12.3.0.eb
  • SUCCESS GCCcore-13.1.0.eb
  • SUCCESS GCCcore-13.2.0.eb
  • SUCCESS GCCcore-13.3.0.eb
  • SUCCESS GCCcore-14.1.0.eb

Build succeeded for 5 out of 5 (5 easyconfigs in total)
jscclxc1.int.jsc-clx.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, Intel Xeon Processor (Cascadelake) (cascadelake), Python 3.9.18
See https://gist.github.com/SebastianAchilles/2b79825a144baa828025421933035a68 for a full test report.

Comment thread easybuild/easyblocks/g/gcc.py
@Thyre Thyre force-pushed the gcc-use-witharch-build-option branch from 5a86bb8 to bce12fe Compare August 28, 2024 14:35
@Thyre Thyre requested a review from boegel August 28, 2024 14:49
@boegel
Copy link
Copy Markdown
Member

boegel commented Sep 10, 2024

@boegelbot please test @ jsc-zen3
EB_ARGS="GCCcore-10.2.0.eb GCCcore-12.3.0.eb GCCcore-14.2.0.eb --installpath /tmp/$USER/pr-3396"

@boegelbot
Copy link
Copy Markdown

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3396 EB_ARGS="GCCcore-10.2.0.eb GCCcore-12.3.0.eb GCCcore-14.2.0.eb --installpath /tmp/$USER/pr-3396" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3396 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4839

Test results coming soon (I hope)...

Details

- notification for comment with ID 2341789066 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Copy Markdown
Member

boegel commented Sep 10, 2024

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS GCCcore-12.3.0.eb
  • SUCCESS GCCcore-13.2.0.eb
  • SUCCESS zlib-1.3.1.eb
  • SUCCESS binutils-2.42.eb
  • SUCCESS GCCcore-14.2.0.eb

Build succeeded for 5 out of 5 (3 easyconfigs in total)
node3900.accelgor.os - Linux RHEL 8.8, x86_64, AMD EPYC 7413 24-Core Processor, 1 x NVIDIA NVIDIA A100-SXM4-80GB, 545.23.08, Python 3.6.8
See https://gist.github.com/boegel/1dcd0f4c7656396fcdacc42dfa4f04f7 for a full test report.

@boegelbot
Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS GCCcore-10.2.0.eb
  • SUCCESS GCCcore-12.3.0.eb
  • SUCCESS GCCcore-14.2.0.eb

Build succeeded for 3 out of 3 (3 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/c33373ba82e48b22ebec6b3a5aa2dc71 for a full test report.

@boegel boegel merged commit 485a195 into easybuilders:develop Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants