Skip to content

{ai}[foss/2023a] MACE v0.3.8 w/ CUDA 12.1.1#23210

Merged
laraPPr merged 6 commits intoeasybuilders:developfrom
pavelToman:20250625140408_new_pr_MACE038
Aug 13, 2025
Merged

{ai}[foss/2023a] MACE v0.3.8 w/ CUDA 12.1.1#23210
laraPPr merged 6 commits intoeasybuilders:developfrom
pavelToman:20250625140408_new_pr_MACE038

Conversation

@pavelToman
Copy link
Copy Markdown
Collaborator

@pavelToman pavelToman commented Jun 25, 2025

(created using eb --new-pr)
resolves vscentrum/vsc-software-stack#574

@pavelToman
Copy link
Copy Markdown
Collaborator Author

pavelToman commented Jun 25, 2025

@boegelbot please test @ jsc-zen3-a100

EDIT: This will fail - have to ask to test on GPU

@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23210 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23210 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7004

Test results coming soon (I hope)...

Details

- notification for comment with ID 3004622202 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4010.donphan.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 570.133.20, Python 3.9.18
See https://gist.github.com/pavelToman/e08ce7c0d539fafbbc39bb28cf18c4be for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4304.litleo.os - Linux RHEL 9.4, x86_64, AMD EPYC 9454P 48-Core Processor, 1 x NVIDIA NVIDIA H100 NVL, 570.133.20, Python 3.9.18
See https://gist.github.com/pavelToman/4765af746e908afe535d85e024e75960 for a full test report.

@github-actions
Copy link
Copy Markdown

Updated software e3nn-0.4.4-foss-2023a-CUDA-12.1.1.eb

Diff against e3nn-0.3.3-foss-2023a-CUDA-12.1.1.eb

easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2023a-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2023a-CUDA-12.1.1.eb b/easybuild/easyconfigs/e/e3nn/e3nn-0.4.4-foss-2023a-CUDA-12.1.1.eb
index e13cc9525e..c06f9436ea 100644
--- a/easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2023a-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/e/e3nn/e3nn-0.4.4-foss-2023a-CUDA-12.1.1.eb
@@ -1,7 +1,7 @@
 easyblock = 'PythonBundle'
 
 name = 'e3nn'
-version = '0.3.3'
+version = '0.4.4'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://e3nn.org/'
@@ -29,7 +29,7 @@ exts_list = [
         'checksums': ['7eeb7f91ecb70be65e6179c106ea7f64fc1db6319e3d1289a4518b384f81e74f'],
     }),
     (name, version, {
-        'checksums': ['532b34a5644153659253c59943fe4224cd9c3c46ce8a79f1dc7c00afccb44ecb'],
+        'checksums': ['51c91a84c1fb72e7e3600000958fa8caad48f8270937090fb8d0f8bfffbb3525'],
     }),
 ]
 
Diff against e3nn-0.3.3-foss-2022a-PyTorch-1.13.1-CUDA-11.7.0.eb

easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2022a-PyTorch-1.13.1-CUDA-11.7.0.eb

diff --git a/easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2022a-PyTorch-1.13.1-CUDA-11.7.0.eb b/easybuild/easyconfigs/e/e3nn/e3nn-0.4.4-foss-2023a-CUDA-12.1.1.eb
index bdd008be27..c06f9436ea 100644
--- a/easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2022a-PyTorch-1.13.1-CUDA-11.7.0.eb
+++ b/easybuild/easyconfigs/e/e3nn/e3nn-0.4.4-foss-2023a-CUDA-12.1.1.eb
@@ -1,9 +1,8 @@
 easyblock = 'PythonBundle'
 
 name = 'e3nn'
-version = '0.3.3'
-local_pytorch_version = '1.13.1'
-versionsuffix = '-PyTorch-' + local_pytorch_version + '-CUDA-%(cudaver)s'
+version = '0.4.4'
+versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://e3nn.org/'
 description = """
@@ -11,25 +10,26 @@ Euclidean neural networks (e3nn) is a python library based on pytorch to create
 neural networks for the group O(3).
 """
 
-toolchain = {'name': 'foss', 'version': '2022a'}
+toolchain = {'name': 'foss', 'version': '2023a'}
 
 dependencies = [
-    ('Python', '3.10.4'),
-    ('SciPy-bundle', '2022.05'),
-    ('CUDA', '11.7.0', '', SYSTEM),
-    ('PyTorch', local_pytorch_version, '-CUDA-%(cudaver)s'),
-    ('sympy', '1.10.1'),
+    ('CUDA', '12.1.1', '', SYSTEM),
+    ('Python', '3.11.3'),
+    ('SciPy-bundle', '2023.07'),
+    ('PyTorch', '2.1.2', versionsuffix),
+    ('sympy', '1.12'),
 ]
 
 exts_list = [
-    ('opt_einsum', '3.3.0', {
+    ('opt-einsum', '3.3.0', {
+        'source_tmpl': 'opt_einsum-%(version)s.tar.gz',
         'checksums': ['59f6475f77bbc37dcf7cd748519c0ec60722e91e63ca114e68821c0c54a46549'],
     }),
     ('opt_einsum_fx', '0.1.4', {
         'checksums': ['7eeb7f91ecb70be65e6179c106ea7f64fc1db6319e3d1289a4518b384f81e74f'],
     }),
     (name, version, {
-        'checksums': ['532b34a5644153659253c59943fe4224cd9c3c46ce8a79f1dc7c00afccb44ecb'],
+        'checksums': ['51c91a84c1fb72e7e3600000958fa8caad48f8270937090fb8d0f8bfffbb3525'],
     }),
 ]
 
Diff against e3nn-0.3.3-foss-2022a-CUDA-11.7.0.eb

easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2022a-CUDA-11.7.0.eb

diff --git a/easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2022a-CUDA-11.7.0.eb b/easybuild/easyconfigs/e/e3nn/e3nn-0.4.4-foss-2023a-CUDA-12.1.1.eb
index 5594cc7c1b..c06f9436ea 100644
--- a/easybuild/easyconfigs/e/e3nn/e3nn-0.3.3-foss-2022a-CUDA-11.7.0.eb
+++ b/easybuild/easyconfigs/e/e3nn/e3nn-0.4.4-foss-2023a-CUDA-12.1.1.eb
@@ -1,7 +1,7 @@
 easyblock = 'PythonBundle'
 
 name = 'e3nn'
-version = '0.3.3'
+version = '0.4.4'
 versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://e3nn.org/'
@@ -10,25 +10,26 @@ Euclidean neural networks (e3nn) is a python library based on pytorch to create
 neural networks for the group O(3).
 """
 
-toolchain = {'name': 'foss', 'version': '2022a'}
+toolchain = {'name': 'foss', 'version': '2023a'}
 
 dependencies = [
-    ('CUDA', '11.7.0', '', SYSTEM),
-    ('Python', '3.10.4'),
-    ('SciPy-bundle', '2022.05'),
-    ('PyTorch', '1.12.0', versionsuffix),
-    ('sympy', '1.10.1'),
+    ('CUDA', '12.1.1', '', SYSTEM),
+    ('Python', '3.11.3'),
+    ('SciPy-bundle', '2023.07'),
+    ('PyTorch', '2.1.2', versionsuffix),
+    ('sympy', '1.12'),
 ]
 
 exts_list = [
-    ('opt_einsum', '3.3.0', {
+    ('opt-einsum', '3.3.0', {
+        'source_tmpl': 'opt_einsum-%(version)s.tar.gz',
         'checksums': ['59f6475f77bbc37dcf7cd748519c0ec60722e91e63ca114e68821c0c54a46549'],
     }),
     ('opt_einsum_fx', '0.1.4', {
         'checksums': ['7eeb7f91ecb70be65e6179c106ea7f64fc1db6319e3d1289a4518b384f81e74f'],
     }),
     (name, version, {
-        'checksums': ['532b34a5644153659253c59943fe4224cd9c3c46ce8a79f1dc7c00afccb44ecb'],
+        'checksums': ['51c91a84c1fb72e7e3600000958fa8caad48f8270937090fb8d0f8bfffbb3525'],
     }),
 ]
 

@pavelToman
Copy link
Copy Markdown
Collaborator Author

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23210 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23210 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7005

Test results coming soon (I hope)...

Details

- notification for comment with ID 3004813222 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 3 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/2810df8bfda8a9f67836025a3e4eb10c for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4304.litleo.os - Linux RHEL 9.4, x86_64, AMD EPYC 9454P 48-Core Processor, 1 x NVIDIA NVIDIA H100 NVL, 570.133.20, Python 3.9.18
See https://gist.github.com/pavelToman/a1175a805dc120e31524d24127086118 for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

pavelToman commented Jun 25, 2025

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 3 (2 easyconfigs in total) jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21 See https://gist.github.com/boegelbot/2810df8bfda8a9f67836025a3e4eb10c for a full test report.

Failing on pip check of orjson-3.9.15-GCCcore-12.3.0 - not in this PR
setuptools-rust 1.11.1 requires semantic-version, which is not installed.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4010.donphan.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 570.133.20, Python 3.9.18
See https://gist.github.com/pavelToman/818e89bb392a7df5bfe3e5f50c7653d3 for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4304.litleo.os - Linux RHEL 9.4, x86_64, AMD EPYC 9454P 48-Core Processor, 1 x NVIDIA NVIDIA H100 NVL, 570.133.20, Python 3.9.18
See https://gist.github.com/pavelToman/e267a4ce1189c9bff758a3eb83c17f23 for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 3 out of 3 (2 easyconfigs in total)
node3305.joltik.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, Python 3.9.18
See https://gist.github.com/pavelToman/be60183109922a81268576be7df6a2a8 for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23210 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23210 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7224

Test results coming soon (I hope)...

Details

- notification for comment with ID 3061346396 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/64b85f9083a99d8cb62574b5747cc874 for a full test report.

Copy link
Copy Markdown
Contributor

@laraPPr laraPPr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@laraPPr laraPPr added this to the 5.x milestone Aug 13, 2025
@laraPPr
Copy link
Copy Markdown
Contributor

laraPPr commented Aug 13, 2025

Going in, thanks @pavelToman!

@laraPPr laraPPr merged commit 3b79f22 into easybuilders:develop Aug 13, 2025
8 checks passed
@laraPPr laraPPr removed this from the 5.x milestone Aug 13, 2025
@laraPPr laraPPr added this to the next release (5.1.2) milestone Aug 13, 2025
@smoors
Copy link
Copy Markdown
Contributor

smoors commented Aug 13, 2025

this broke the test suite, i think you should add a -ASE-3.24.0 versionsuffix. not sure how it slipped through, though.

FAIL: test_dep_versions_per_toolchain (test.easyconfigs.easyconfigs.EasyConfigTest)
Check whether there's only one dependency version per toolchain actively used.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 910, in test_dep_versions_per_toolchain
    self.fail("Should not have multi-variant dependencies in easyconfigs:\n%s" % multi_dep_vars_msg)
AssertionError: Should not have multi-variant dependencies in easyconfigs:
Found 2 variants of 'ASE' dependency in easyconfigs using '2023a' toolchain generation
* version: 3.22.1; versionsuffix:  as dep for {'PyTorch-Geometric-2.5.0-foss-2023a-PyTorch-2.1.2-CUDA-12.1.1.eb', 'GPAW-24.1.0-intel-2023a.eb', 'nglview-3.1.2-foss-2023a.eb', 'ASAP3-3.13.3-foss-2023a.eb', 'OpenForceField-Toolkit-0.16.0-foss-2023a.eb', 'ASAP3-3.13.2-foss-2023a.eb', 'pyiron-0.5.1-foss-2023a.eb', 'ASAP3-3.13.3-intel-2023a.eb', 'GPAW-23.9.1-foss-2023a.eb', 'PyTorch-Geometric-2.5.0-foss-2023a-PyTorch-2.1.2.eb', 'GPAW-24.1.0-foss-2023a.eb'}
  * version: 3.24.0; versionsuffix:  as dep for {'MACE-0.3.8-foss-2023a-CUDA-12.1.1.eb', 'ASAP3-3.13.7-foss-2023a-ASE-3.24.0.eb', 'GPAW-25.1.0-intel-2023a-ASE-3.24.0.eb', 'GPAW-25.1.0-foss-2023a-ASE-3.24.0.eb', 'ASAP3-3.13.7-intel-2023a-ASE-3.24.0.eb'}

@laraPPr
Copy link
Copy Markdown
Contributor

laraPPr commented Aug 13, 2025

Was this not supposed to be resolved by the changes that were made to test/easyconfigs/easyconfigs.py in this pr? This is why it passed the test-suite in this pr I think.

@smoors
Copy link
Copy Markdown
Contributor

smoors commented Aug 13, 2025

not sure, i've updated the versionsuffix here, let's see what it gives:

@smoors
Copy link
Copy Markdown
Contributor

smoors commented Aug 13, 2025

Was this not supposed to be resolved by the changes that were made to test/easyconfigs/easyconfigs.py in this pr? This is why it passed the test-suite in this pr I think.

we now have ASE as exception both in versionsuffix_deps (for all easyconfigs) and in alt_dep_versions (for this specific MACE easyconfig only), which is redundant due to the first exception. i still don't know why the tests are suddenly failing, but in any case we should probably remove from alt_dep_versions and use a versionsuffix.

edit: done in #23660

@orbsmiv orbsmiv mentioned this pull request Sep 5, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MACE

4 participants