Skip to content

fix PAPI test step hanging on some systems#19372

Merged
jfgrimm merged 2 commits intoeasybuilders:developfrom
Flamefire:20231207092912_new_pr_PAPI543
Dec 7, 2023
Merged

fix PAPI test step hanging on some systems#19372
jfgrimm merged 2 commits intoeasybuilders:developfrom
Flamefire:20231207092912_new_pr_PAPI543

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

@Flamefire Flamefire commented Dec 7, 2023

(created using eb --new-pr)

On some system the fulltest is hanging with a defunct make (zombie process) and T+ state on various tests containing "attach" in the name. Observed especially on AMD EPYC CPUs.

Seemingly parallel = 1 was at some point used to avoid this (without success).

This changes to "test" instead of "fulltest" which seems to work and makes sure the ECs are the same removing the now unnecessary "parallel=1" and adding a configopt used only in one 6.0.0.1 EC but not the other where it was seemingly forgotten.

If wanted we can add this configopt also to the 6.0.0 EC but none of the 6.0.0 ECs had it so I just made the 2 6.0.0.1 ones consistent.

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Dec 7, 2023

Might be worth adding a comment to the ECs mentioning why we're not running fulltest

@Flamefire
Copy link
Copy Markdown
Contributor Author

Makes sense I guess. Added

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Dec 7, 2023

@boegelbot please test @ generoso
CORE_CNT=16

@jfgrimm jfgrimm added the bug fix label Dec 7, 2023
@jfgrimm jfgrimm added this to the next release (4.9.0?) milestone Dec 7, 2023
@boegelbot
Copy link
Copy Markdown
Collaborator

@jfgrimm: Request for testing this PR well received on login1

PR test command 'EB_PR=19372 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19372 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12338

Test results coming soon (I hope)...

Details

- notification for comment with ID 1845115202 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Dec 7, 2023

Test report by @jfgrimm
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
node020.viking2.yor.alces.network - Linux Rocky Linux 8.8, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/b758ec62992af68637185d673157e6ad for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/3965795429719b8a3a47d7b5dbc8f86e for a full test report.

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Dec 7, 2023

@boegelbot please test @ jsc-zen2
CORE_CNT=16

Copy link
Copy Markdown
Member

@jfgrimm jfgrimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
as usual, thanks for taking a closer look and raising a PR :)

@boegelbot
Copy link
Copy Markdown
Collaborator

@jfgrimm: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19372 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19372 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3850

Test results coming soon (I hope)...

Details

- notification for comment with ID 1845153014 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/aead5b2060a7d46ad7ca271a45e41dce for a full test report.

@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
i8015 - Linux Rocky Linux 8.7, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.6.8
See https://gist.github.com/Flamefire/ad2e2a88250f149dff289405f24aaefd for a full test report.

@jfgrimm
Copy link
Copy Markdown
Member

jfgrimm commented Dec 7, 2023

Going in, thanks @Flamefire!

@jfgrimm jfgrimm merged commit a16f454 into easybuilders:develop Dec 7, 2023
@Flamefire Flamefire deleted the 20231207092912_new_pr_PAPI543 branch December 7, 2023 15:20
@Flamefire
Copy link
Copy Markdown
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 8 out of 8 (8 easyconfigs in total)
n1432 - Linux RHEL 8.7 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.13
See https://gist.github.com/Flamefire/acc4bc511a0db303b8599ceafe95a814 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants