Skip to content

fix PyTorch-1.12.1-foss-2022a-CUDA-11.7.0.eb for Linux 6+#20178

Merged
casparvl merged 3 commits intoeasybuilders:developfrom
Flamefire:20240321123703_new_pr_PyTorch1121
Mar 25, 2024
Merged

fix PyTorch-1.12.1-foss-2022a-CUDA-11.7.0.eb for Linux 6+#20178
casparvl merged 3 commits intoeasybuilders:developfrom
Flamefire:20240321123703_new_pr_PyTorch1121

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

(created using eb --new-pr)

@Flamefire Flamefire force-pushed the 20240321123703_new_pr_PyTorch1121 branch from 30dd58a to 0b47908 Compare March 22, 2024 09:17
@casparvl
Copy link
Copy Markdown
Contributor

@boegelbot please test @ jsc-zen3
CORE_CNT=16

@boegelbot
Copy link
Copy Markdown
Collaborator

@casparvl: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=20178 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_20178 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3848

Test results coming soon (I hope)...

Details

- notification for comment with ID 2015608741 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c3.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/fe0fc10d6054bcf5cf8efae3c706ec3b for a full test report.

@casparvl
Copy link
Copy Markdown
Contributor

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Copy Markdown
Collaborator

@casparvl: Request for testing this PR well received on login1

PR test command 'EB_PR=20178 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_20178 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13196

Test results coming soon (I hope)...

Details

- notification for comment with ID 2016794012 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cnx3 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/72046101ba1bf6b52bd831975c997a5f for a full test report.

@casparvl
Copy link
Copy Markdown
Contributor

Test report by @casparvl
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
gcn6.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.6.8
See https://gist.github.com/casparvl/63f58e2785718229557101277bbcdfae for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

@casparvl I allowed a few tests to fail. There were only 2 failures (test_set_and_get_default_rpc_timeout, test_streams_and_events) which might be caused by timing issues or so judging from the name. So allowing 4 failures should be safe here.

@casparvl
Copy link
Copy Markdown
Contributor

Yeah, makes sense, I saw you did the same in the non-CUDA equivalent (#20179). I'll give it one more test run just to be sure, I'm assuming that'll pass now :)

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8004 - Linux Rocky Linux 8.7 (Green Obsidian), x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.8.13
See https://gist.github.com/Flamefire/55c032e30b8e50a573ff4f681b2b6348 for a full test report.

@casparvl
Copy link
Copy Markdown
Contributor

Test report by @casparvl
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
gcn6.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.6.8
See https://gist.github.com/casparvl/82877f279b6934a8f50406e3590c6e96 for a full test report.

@casparvl
Copy link
Copy Markdown
Contributor

Disk quota exceeded. Ok. I give up on testing here, you already submitted a new test report after your final EasyConfig change, that's enough for me. I'll force the merge :D

@casparvl casparvl added this to the release after 4.9.0 milestone Mar 25, 2024
@casparvl
Copy link
Copy Markdown
Contributor

Going in, thanks @Flamefire!

@casparvl casparvl merged commit d341d89 into easybuilders:develop Mar 25, 2024
@Flamefire Flamefire deleted the 20240321123703_new_pr_PyTorch1121 branch March 26, 2024 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants