Skip to content

Change 120f to 120 for cuda 12.8.0#200

Open
casparvl wants to merge 2 commits intoEESSI:mainfrom
casparvl:change_120f_to_120_for_cuda_1280
Open

Change 120f to 120 for cuda 12.8.0#200
casparvl wants to merge 2 commits intoEESSI:mainfrom
casparvl:change_120f_to_120_for_cuda_1280

Conversation

@casparvl
Copy link
Copy Markdown
Contributor

@casparvl casparvl commented Apr 14, 2026

Fixes

nvcc fatal   : Unsupported gpu architecture 'compute_120f'

Errors for the 120f target and CUDA toolkit 12.8.0.

@casparvl casparvl changed the title Change 120f to 120 for cuda 1280 Change 120f to 120 for cuda 12.8.0 Apr 14, 2026
@casparvl
Copy link
Copy Markdown
Contributor Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@casparvl casparvl requested a review from bedroge April 14, 2026 10:02
@casparvl
Copy link
Copy Markdown
Contributor Author

I tested this locally.

eb --cuda-compute-capabilities=12.0f NCCL-2.27.7-GCCcore-14.2.0-CUDA-12.8.0.eb --cuda-sanity-check-accept-missing-ptx

Fails, while

eb --cuda-compute-capabilities=12.0f --hooks $PWD/eb_hooks.py NCCL-2.27.7-GCCcore-14.2.0-CUDA-12.8.0.eb --cuda-sanity-check-accept-missing-ptx

Works.

@casparvl
Copy link
Copy Markdown
Contributor Author

You can also see that the hooks is effective on stdout (note the NVCC_GENCODE being specified):

Output is

  >> running shell command:
        make  -j 16  NVCC_GENCODE="-gencode=arch=compute_120f,code=sm_120f"  PREFIX=/home/casparl/eessi/versions/2025.06/software/linux/x86_64/intel/icelake/software/NCCL/2.27.7-GCCcore-14.2.0-CUDA-12.8.0

for the original run, and after the fix in this PR:

== Running pre-run_shell_cmd hook...
  >> running shell command:
        make  -j 16  NVCC_GENCODE="-gencode=arch=compute_120,code=sm_120"  PREFIX=/home/casparl/eessi/versions/2025.06/software/linux/x86_64/intel/icelake/software/NCCL/2.27.7-GCCcore-14.2.0-CUDA-12.8.0

@casparvl
Copy link
Copy Markdown
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws bot commented Apr 14, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2023.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /project/def-users/SHARED/jobs/2026.04/pr_200/147820

date job status comment
Apr 14 11:29:06 UTC 2026 submitted job id 147820 awaits release by job manager
Apr 14 11:29:17 UTC 2026 released job awaits launch by Slurm scheduler
Apr 14 11:29:30 UTC 2026 finished job id 147820 was cancelled
Apr 14 11:30:20 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job147820.result does not exist in job directory or reading it failed.
  • No artefacts were found/reported.
Apr 14 11:30:20 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job147820.test does not exist in job directory or reading it failed.

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws bot commented Apr 14, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /project/def-users/SHARED/jobs/2026.04/pr_200/147821

date job status comment
Apr 14 11:29:11 UTC 2026 submitted job id 147821 awaits release by job manager
Apr 14 11:29:15 UTC 2026 released job awaits launch by Slurm scheduler
Apr 14 11:29:57 UTC 2026 finished job id 147821 was cancelled
Apr 14 11:30:18 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job147821.result does not exist in job directory or reading it failed.
  • No artefacts were found/reported.
Apr 14 11:30:18 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job147821.test does not exist in job directory or reading it failed.

@casparvl
Copy link
Copy Markdown
Contributor Author

bot:cancel job:147820

@casparvl
Copy link
Copy Markdown
Contributor Author

bot:cancel job:147821

We should prove it works first and remove the easystack file before building...

@casparvl
Copy link
Copy Markdown
Contributor Author

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@eessi-bot-rug
Copy link
Copy Markdown

eessi-bot-rug bot commented Apr 14, 2026

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: amd-zen5 and accelerator nvidia/cc120
Building for: x86_64/amd/zen5 and accelerator nvidia/cc120
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_200/28468357

date job status comment
Apr 14 11:30:08 UTC 2026 submitted job id 28468357 awaits release by job manager
Apr 14 11:30:27 UTC 2026 released job awaits launch by Slurm scheduler
Apr 14 11:32:31 UTC 2026 running job 28468357 is running
Apr 14 11:34:32 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-28468357.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-accel-nvidia-cc120-17761663540.tar.zstsize: 0 MiB (860900 bytes)
entries: 53
modules under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/modules/all
UCX-CUDA/1.18.0-GCCcore-14.2.0-CUDA-12.8.0.lua
software under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/software
UCX-CUDA/1.18.0-GCCcore-14.2.0-CUDA-12.8.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/reprod
UCX-CUDA/1.18.0-GCCcore-14.2.0-CUDA-12.8.0/20260414_113228UTC
other under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120
2025.06/init/easybuild/eb_hooks.py
Apr 14 11:34:33 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /6d7a17a9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /e9b09ad8 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (3/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /a102bba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (4/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /d58e51e9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ PASSED ] Ran 0/4 test case(s) from 4 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-28468357.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant