Skip to content

add patch to fix using wrong OpenMP library in PyTorch 2.3.0 w/ foss/2023b + CUDA 12.4.0#25542

Merged
boegel merged 2 commits intoeasybuilders:developfrom
Flamefire:20260311115536_new_pr_PyTorch230
Apr 8, 2026
Merged

add patch to fix using wrong OpenMP library in PyTorch 2.3.0 w/ foss/2023b + CUDA 12.4.0#25542
boegel merged 2 commits intoeasybuilders:developfrom
Flamefire:20260311115536_new_pr_PyTorch230

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

(created using eb --new-pr)

@boegel boegel added bug fix and removed change labels Mar 11, 2026
@boegel boegel added this to the next release (5.2.2?) milestone Mar 11, 2026
@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 11, 2026

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr25544"

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=25542 EB_ARGS="--installpath /tmp/$USER/pr25544" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_25542 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10010

Test results coming soon (I hope)...

Details

- notification for comment with ID 4040509279 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire
Copy link
Copy Markdown
Contributor Author

Flamefire commented Mar 12, 2026

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (total: 33 hours 15 mins 20 secs) (1 easyconfigs in total)
i8004 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 580.65.06, Python 3.9.21
See https://gist.github.com/Flamefire/23bde1ee832587f29e00861c5196a286 for a full test report.

Failures are triggered by my read-only ~/.triton folder and Triton doesn't support TRITON_HOME until 3.2.
I added that as a patch to the existing EC in #25557 and will rebuild that PR and then this one. Besides that results look ok-ish.

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (total: 21 hours 35 mins 12 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.48.01, Python 3.9.25
See https://gist.github.com/boegelbot/b57e778bde0c09719fb6c7bd0b4c62ee for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (total: 32 hours 43 mins 44 secs) (1 easyconfigs in total)
i8009 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 580.65.06, Python 3.9.21
See https://gist.github.com/Flamefire/877cdb44fc97ec995a53835fb5a8ca18 for a full test report.

@boegel boegel changed the title Fix using wrong OpenMP library in PyTorch-2.3.0-foss-2023b-CUDA-12.4.0 add patch to fix using wrong OpenMP library in PyTorch 2.3.0 w/ foss/2023b + CUDA 12.4.0 Apr 8, 2026
Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 8, 2026

Going in, thanks @Flamefire!

@boegel boegel merged commit 4242696 into easybuilders:develop Apr 8, 2026
6 checks passed
@Flamefire Flamefire deleted the 20260311115536_new_pr_PyTorch230 branch April 9, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants