Skip to content

fix failing RPATH sanity check for recent dorado easyconfigs using foss/2023a toolchain#21673

Merged
Micket merged 1 commit intoeasybuilders:developfrom
boegel:20241016104618_new_pr_dorado061
Oct 21, 2024
Merged

fix failing RPATH sanity check for recent dorado easyconfigs using foss/2023a toolchain#21673
Micket merged 1 commit intoeasybuilders:developfrom
boegel:20241016104618_new_pr_dorado061

Conversation

@boegel
Copy link
Copy Markdown
Member

@boegel boegel commented Oct 16, 2024

(created using eb --new-pr)
requires fix for patchelf to avoid "ELF load command address/offset not properly aligned" (cfr. NixOS/patchelf#492):

This fix is important in preparation of the EasyBuild 5.0 release where RPATH is enabled by default, but since it's a bug fix that's also relevant for EasyBuild 4.x I've targeted the develop branch deliberately here..

@boegel boegel added bug fix EasyBuild-5.0 EasyBuild 5.0 labels Oct 16, 2024
@boegel boegel added this to the release after 4.9.4 milestone Oct 16, 2024
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 16, 2024

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21673 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21673 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5083

Test results coming soon (I hope)...

Details

- notification for comment with ID 2416148631 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel boegel changed the title fix failing RPATH sanity check for recent dorado easyconfigs fix failing RPATH sanity check for recent dorado easyconfigs using foss/2023a toolchain Oct 16, 2024
@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/435ae6ba6cbb5a7698d3da7f2b100e92 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 16, 2024

@boegelbot please test @ jsc-zen3
EB_ARGS="--rpath --installpath /tmp/$USER/pr21673-rpath"

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21673 EB_ARGS="--rpath --installpath /tmp/$USER/pr21673-rpath" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21673 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5085

Test results coming soon (I hope)...

Details

- notification for comment with ID 2416215161 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 3 (3 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/d9d1be9a44f0812fe95331bd559710b8 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 16, 2024

RPATH sanity check is now failing on other libraries being copied, because those dependencies were not built with RPATH linking enabled, so installing dorado in a "mixed" software stack where some installations were done without RPATH, and dorado with RPATH, won't work:

No '(RPATH)' found in 'readelf -d' output for /tmp/boegelbot/pr21673-rpath/software/dorado/0.6.1-foss-2023a-CUDA-12.1.1/lib64/libsz.so.2.0.1
...
No '(RPATH)' found in 'readelf -d' output for /tmp/boegelbot/pr21673-rpath/software/dorado/0.6.1-foss-2023a-CUDA-12.1.1/lib64/libaec.so.0.0.12
...
No '(RPATH)' found in 'readelf -d' output for /tmp/boegelbot/pr21673-rpath/software/dorado/0.6.1-foss-2023a-CUDA-12.1.1/lib64/libzstd.so.1.5.5

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 16, 2024

@boegelbot please test @ jsc-zen3
EB_ARGS="--rpath --installpath /tmp/$USER/pr21673-rpath libaec-1.0.6-GCCcore-12.3.0.eb zstd-1.5.5-GCCcore-12.3.0.eb dorado-0.6.1-foss-2023a-CUDA-12.1.1.eb dorado-0.7.3-foss-2023a-CUDA-12.1.1.eb dorado-0.8.0-foss-2023a-CUDA-12.1.1.eb"

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21673 EB_ARGS="--rpath --installpath /tmp/$USER/pr21673-rpath libaec-1.0.6-GCCcore-12.3.0.eb zstd-1.5.5-GCCcore-12.3.0.eb dorado-0.6.1-foss-2023a-CUDA-12.1.1.eb dorado-0.7.3-foss-2023a-CUDA-12.1.1.eb dorado-0.8.0-foss-2023a-CUDA-12.1.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21673 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5087

Test results coming soon (I hope)...

Details

- notification for comment with ID 2416323749 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 2 out of 5 (5 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/9fea5822d08a329f2d021e6236afaca2 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 16, 2024

Ah, the PyTorch module files need to be re-generated to pick up on the fix from easybuilders/easybuild-easyblocks#3488...

I'll do that (manually) via --module-only in the bot accounts

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 16, 2024

@boegelbot please test @ jsc-zen3
EB_ARGS="--rpath --installpath /tmp/$USER/pr21673-rpath libaec-1.0.6-GCCcore-12.3.0.eb zstd-1.5.5-GCCcore-12.3.0.eb dorado-0.6.1-foss-2023a-CUDA-12.1.1.eb dorado-0.7.3-foss-2023a-CUDA-12.1.1.eb dorado-0.8.0-foss-2023a-CUDA-12.1.1.eb"

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21673 EB_ARGS="--rpath --installpath /tmp/$USER/pr21673-rpath libaec-1.0.6-GCCcore-12.3.0.eb zstd-1.5.5-GCCcore-12.3.0.eb dorado-0.6.1-foss-2023a-CUDA-12.1.1.eb dorado-0.7.3-foss-2023a-CUDA-12.1.1.eb dorado-0.8.0-foss-2023a-CUDA-12.1.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21673 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5099

Test results coming soon (I hope)...

Details

- notification for comment with ID 2417609507 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/4bc78dbef8d72f72d727e7a1fea2d7cf for a full test report.


# disable CMake fiddling with RPATH when EasyBuild is configured to use RPATH linking
configopts += "$(if %(rpath_enabled)s; then "
configopts += "echo '-DCMAKE_SKIP_INSTALL_RPATH=YES -DCMAKE_SKIP_RPATH=YES'; fi) "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have no nicer way to check build_options in easyconfigs (yet?), but this one is a bit cryptic.

I was thinking at least the config options maybe wouldn't cause any harm to include these when rpath isn't enabled? I wouldn't expect CMAKE_SKIP_RPATH=YES to unwantingly also enable RPATH
maybe fine to just include them in copts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about that...

By default, they're injecting $ORIGIN into the RUNPATH section of several libraries they copy, to make sure those libraries can "find" each other.

That's done at install time, after copying library files, so it totally escapes our RPATH compiler/linker wrappers.

Copy link
Copy Markdown
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat reluctant merge on this i guess

@Micket Micket merged commit 069208e into easybuilders:develop Oct 21, 2024
@laraPPr laraPPr modified the milestones: release after 4.9.4, 5.0 Oct 28, 2024
@boegel boegel deleted the 20241016104618_new_pr_dorado061 branch November 6, 2024 08:50
@boegel boegel modified the milestones: release after 4.9.4, 5.0.0 Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants