Skip to content

{numlib}[foss/2023b] Ginkgo v1.9.0#22719

Merged
ocaisa merged 12 commits intoeasybuilders:developfrom
pratikvn:20250407102344_new_pr_ginkgo190
May 27, 2025
Merged

{numlib}[foss/2023b] Ginkgo v1.9.0#22719
ocaisa merged 12 commits intoeasybuilders:developfrom
pratikvn:20250407102344_new_pr_ginkgo190

Conversation

@pratikvn
Copy link
Copy Markdown
Contributor

@pratikvn pratikvn commented Apr 7, 2025

(created using eb --new-pr)

@github-actions github-actions Bot added the new label Apr 7, 2025
Comment thread easybuild/easyconfigs/g/ginkgo/ginkgo-1.9.0-foss-2023b.eb Outdated
Comment thread easybuild/easyconfigs/g/ginkgo/ginkgo-1.9.0-foss-2023b.eb Outdated
Comment thread easybuild/easyconfigs/g/ginkgo/ginkgo-1.9.0-foss-2023b.eb Outdated
Comment thread easybuild/easyconfigs/g/ginkgo/ginkgo-1.9.0-foss-2023b.eb Outdated
Comment thread easybuild/easyconfigs/g/ginkgo/ginkgo-1.9.0-foss-2023b.eb Outdated
Comment thread easybuild/easyconfigs/g/ginkgo/ginkgo-1.9.0-foss-2023b.eb Outdated
@pratikvn
Copy link
Copy Markdown
Contributor Author

pratikvn commented Apr 7, 2025

@boegel , I am sorry, with the name change it also created a new file, so unfortunately the review comments are not visible anymore, but I think I addressed all of them.

@pratikvn pratikvn requested a review from boegel April 8, 2025 07:41
@boegel boegel changed the title {numlib}[foss/2023b] ginkgo v1.9.0 {numlib}[foss/2023b] Ginkgo v1.9.0 Apr 8, 2025
@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 8, 2025

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22719 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22719 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6131

Test results coming soon (I hope)...

Details

- notification for comment with ID 2785825266 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/3828d481eec5dc0f6fef56e39a5eafbd for a full test report.

boegel
boegel previously requested changes Apr 8, 2025
Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-foss-2023b.eb Outdated
@pratikvn pratikvn requested a review from boegel April 9, 2025 06:38
@github-actions github-actions Bot added update and removed new labels Apr 9, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2025

Updated software Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb

Diff against Ginkgo-1.9.0-gompi-2023b.eb

easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb

diff --git a/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb b/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb
index 59ab916573..0c84a2c558 100644
--- a/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb
+++ b/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb
@@ -2,6 +2,7 @@ easyblock = 'CMakeMake'
 
 name = 'Ginkgo'
 version = '1.9.0'
+versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://github.com/ginkgo-project/ginkgo'
 description = """Ginkgo is a high-performance numerical linear algebra library with
@@ -24,12 +25,15 @@ builddependencies = [
     ('CMake', '3.27.6')
 ]
 
-configopts = '-DGINKGO_BUILD_MPI=ON -DGINKGO_BUILD_OMP=ON -DGINKGO_FAST_TESTS=ON'
+dependencies = [('CUDA', '12.8.0', '', SYSTEM)]
+
+configopts = '-DGINKGO_BUILD_MPI=ON -DGINKGO_BUILD_CUDA=ON -DGINKGO_BUILD_OMP=ON -DGINKGO_FAST_TESTS=ON'
 
 build_shared_libs = True
 
 pretestopts = "export OMP_NUM_THREADS=1 && "
 runtest = True
+testopts = '-E "test/mpi/solver/solver_cuda"'
 
 sanity_check_paths = {
     'files': ['lib/libginkgo.%s' % SHLIB_EXT, 'lib/libginkgo_device.%s' % SHLIB_EXT],

Updated software Ginkgo-1.9.0-gompi-2023b.eb

Diff against Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb

easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb

diff --git a/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb b/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb
index 0c84a2c558..59ab916573 100644
--- a/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb
+++ b/easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb
@@ -2,7 +2,6 @@ easyblock = 'CMakeMake'
 
 name = 'Ginkgo'
 version = '1.9.0'
-versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://github.com/ginkgo-project/ginkgo'
 description = """Ginkgo is a high-performance numerical linear algebra library with
@@ -25,15 +24,12 @@ builddependencies = [
     ('CMake', '3.27.6')
 ]
 
-dependencies = [('CUDA', '12.8.0', '', SYSTEM)]
-
-configopts = '-DGINKGO_BUILD_MPI=ON -DGINKGO_BUILD_CUDA=ON -DGINKGO_BUILD_OMP=ON -DGINKGO_FAST_TESTS=ON'
+configopts = '-DGINKGO_BUILD_MPI=ON -DGINKGO_BUILD_OMP=ON -DGINKGO_FAST_TESTS=ON'
 
 build_shared_libs = True
 
 pretestopts = "export OMP_NUM_THREADS=1 && "
 runtest = True
-testopts = '-E "test/mpi/solver/solver_cuda"'
 
 sanity_check_paths = {
     'files': ['lib/libginkgo.%s' % SHLIB_EXT, 'lib/libginkgo_device.%s' % SHLIB_EXT],

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 10, 2025

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22719 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22719 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6156

Test results coming soon (I hope)...

Details

- notification for comment with ID 2793185627 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/901738bd001b625ab4320a2e80ad9468 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 10, 2025

@pratikvn Does this ring any bell? That was in a test environment with 8 cores available:

The following tests FAILED:
	357 - test/mpi/solver/solver_omp (Timeout)
	358 - test/mpi/solver/solver_cuda (Timeout)
289/375 Test #289: test/mpi/solver/solver_omp ....................................***Timeout 1500.15 sec
Running without CTest ctest_resource configuration
Rank 0: Rank 1: Rank 2: OmpExecutor (8 threads)
OmpExecutor (8 threads)
OmpExecutor (8 threads)
[==========] Running 48 tests from 12 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from Solver/Cg, where TypeParam = Cg
[ RUN      ] Solver/Cg.ApplyIsEquivalentToRef
[       OK ] Solver/Cg.ApplyIsEquivalentToRef (203404 ms)
[ RUN      ] Solver/Cg.AdvancedApplyIsEquivalentToRef
[       OK ] Solver/Cg.AdvancedApplyIsEquivalentToRef (215733 ms)
[ RUN      ] Solver/Cg.MixedApplyIsEquivalentToRef
[       OK ] Solver/Cg.MixedApplyIsEquivalentToRef (208652 ms)
[ RUN      ] Solver/Cg.MixedAdvancedApplyIsEquivalentToRef
[       OK ] Solver/Cg.MixedAdvancedApplyIsEquivalentToRef (212706 ms)
[----------] 4 tests from Solver/Cg (840519 ms total)

[----------] 4 tests from Solver/CgWithMg, where TypeParam = CgWithMg
[ RUN      ] Solver/CgWithMg.ApplyIsEquivalentToRef
        Start 358: test/mpi/solver/solver_cuda
358/495 Test #358: test/mpi/solver/solver_cuda ...................................***Timeout 1500.12 sec
Running without CTest ctest_resource configuration
Rank 0: CudaExecutor on device 0 (NVIDIA A100 80GB PCIe) with host ReferenceExecutor
Rank 1: CudaExecutor on device 0 (NVIDIA A100 80GB PCIe) with host ReferenceExecutor
Rank 2: CudaExecutor on device 0 (NVIDIA A100 80GB PCIe) with host ReferenceExecutor
[==========] Running 48 tests from 12 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from Solver/Cg, where TypeParam = Cg
[ RUN      ] Solver/Cg.ApplyIsEquivalentToRef
[       OK ] Solver/Cg.ApplyIsEquivalentToRef (3444 ms)
[ RUN      ] Solver/Cg.AdvancedApplyIsEquivalentToRef
[       OK ] Solver/Cg.AdvancedApplyIsEquivalentToRef (3516 ms)
[ RUN      ] Solver/Cg.MixedApplyIsEquivalentToRef
[       OK ] Solver/Cg.MixedApplyIsEquivalentToRef (3352 ms)
[ RUN      ] Solver/Cg.MixedAdvancedApplyIsEquivalentToRef
[       OK ] Solver/Cg.MixedAdvancedApplyIsEquivalentToRef (3337 ms)
[----------] 4 tests from Solver/Cg (13652 ms total)

[----------] 4 tests from Solver/CgWithMg, where TypeParam = CgWithMg
[ RUN      ] Solver/CgWithMg.ApplyIsEquivalentToRef

@pratikvn
Copy link
Copy Markdown
Contributor Author

pratikvn commented Apr 11, 2025

@boegel , I think this is an issue with oversubscription of cores by MPI and then that OpenMP has set OMP_NUM_THREADS=8 ? I think if we set the environment variable to OMP_NUM_THREADS=1, I believe that might resolve the issue. I think MPI+OpenMP with threads suffers from some oversubscription and/or thread to core mapping issues.

@pratikvn
Copy link
Copy Markdown
Contributor Author

pratikvn commented Apr 11, 2025

Unfortunately, the same issue of oversubscription causes long runtimes for the CUDA tests as well, as multiple ranks use the same GPU. This oversubscription issue has been observed by others as well, and seems to be particularly a problem in OpenMPI.

So, I would propose that we skip this particular test, by calling ctest -E "test/mpi/solver/solver_cuda" when running the tests, which should run all tests expecpt this one that stalls due to oversubscription. I think it would be better if we set the number of OpenMP threads to 1 for this CUDA tests as well.

@pratikvn
Copy link
Copy Markdown
Contributor Author

pratikvn commented May 9, 2025

@boegel , would you have any suggestions/recommendations on how to resolve these test issues ?

@boegel
Copy link
Copy Markdown
Member

boegel commented May 27, 2025

@pratikvn Sorry for the radio silence on this one, I lost track of it.

To control the environment for the test step, you can use pretestopts in the easyconfig file, for example:

pretestopts = "export OMP_NUM_THREADS=1 && "

to pass options to the make test command being run, you can use testopts:

testopts = "EXAMPLE=example"

That would result in running make test EXAMPLE=example rather than just make test.

If there's a better way to run the tests than make test, we can specify test_cmd (and drop the runtest):

test_cmd = "ctest ..."

I'm not sure if the latter makes sense, since https://github.com/ginkgo-project/ginkgo/blob/develop/TESTING.md seems to suggest that running ctest also implies building Ginkgo, which we don't want to do (it was built already when we're in the test step).

Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb Outdated
@pratikvn
Copy link
Copy Markdown
Contributor Author

@boegel and @ocaisa thank you for your feedback. I think test_cmd should be a better option. I believe it should not build Ginkgo again. Setting OMP_NUM_THREADS=1 will probably slow some tests down, but I think that should hopefully not be a significant slowdown.

Please let me know in case there are new issues when the tests have finished running.

Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb Outdated
Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb Outdated
Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb Outdated
Copy link
Copy Markdown
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--from-pr not working due to malformed github_account:

https://github.com/['ginkgo-project']/ginkgo/archive/v1.9.0.tar.gz

Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb Outdated
Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b-CUDA-12.8.0.eb Outdated
Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb Outdated
Comment thread easybuild/easyconfigs/g/Ginkgo/Ginkgo-1.9.0-gompi-2023b.eb Outdated
@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented May 27, 2025

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Copy Markdown
Collaborator

@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22719 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22719 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6492

Test results coming soon (I hope)...

Details

- notification for comment with ID 2912261109 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/c863dbd3eaaab6827566f38e16cadfc2 for a full test report.

Copy link
Copy Markdown
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks @pratikvn

@ocaisa ocaisa dismissed boegel’s stale review May 27, 2025 13:36

Review comments addressed

@ocaisa ocaisa merged commit a4fbfab into easybuilders:develop May 27, 2025
8 checks passed
@boegel boegel added this to the next release (5.1.1?) milestone Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants