Skip to content

correctly check whether --with-ucx is used as OpenMPI configure option (taking into account --with-ucx=no)#2501

Merged
boegel merged 1 commit intoeasybuilders:developfrom
Flamefire:ompi_dep_handling
Jul 2, 2021
Merged

correctly check whether --with-ucx is used as OpenMPI configure option (taking into account --with-ucx=no)#2501
boegel merged 1 commit intoeasybuilders:developfrom
Flamefire:ompi_dep_handling

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

@Flamefire Flamefire commented Jul 2, 2021

We want to disable verbs only when UCX is disabled. So also consider --with-ucx=no as disabled (done in last #2500)

# this is required to avoid "error initializing an OpenFabrics device" warnings,
# see also https://www.open-mpi.org/faq/?category=all#ofa-device-error
if LooseVersion(self.version) >= LooseVersion('4.0.0') and '--with-ucx' in self.cfg['configopts']:
is_ucx_enabled = ('--with-ucx' in self.cfg['configopts'] and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now enforce "--with-ucx=path" does this really work correctly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. --with-ucx=path means enabled as does --with-ucx, but --with-ucx=no means the opposite

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly mean, does '--with-ucx' in self.cfg['configopts'] return true if we have --with-ucx=path in configopts?

I can never remember how that "in" operator works...

@Flamefire Flamefire force-pushed the ompi_dep_handling branch from a6b8b93 to 4f670be Compare July 2, 2021 15:25
@Flamefire
Copy link
Copy Markdown
Contributor Author

Tests:

  • ./configure --prefix=/tmp/software/OpenMPI/4.0.1-GCC-8.3.0-2.32 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --with-slurm --with-pmi=/usr --with-pmi-libdir=/usr/lib64 --with-knem=/opt/knem-1.1.3.90mlnx1 --enable-mpirun-prefix-by-default --enable-shared --with-cuda=no --with-hwloc=/scratch/ws/1/s3248973-EasyBuild/easybuild-haswell/software/hwloc/2.0.3-GCCcore-8.3.0 --with-libevent=internal --with-ofi=no --with-pmix=internal --with-ucx=no --with-verbs
  • ./configure --prefix=/tmp/software/OpenMPI/4.1.1-GCC-10.3.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --with-slurm --with-pmi=/usr --with-pmi-libdir=/usr/lib64 --with-knem=/opt/knem-1.1.3.90mlnx1 --enable-mpirun-prefix-by-default --enable-shared --with-cuda=no --with-hwloc=/scratch/ws/1/s3248973-EasyBuild/easybuild-haswell/software/hwloc/2.4.1-GCCcore-10.3.0 --with-libevent=/scratch/ws/1/s3248973-EasyBuild/easybuild-haswell/software/libevent/2.1.12-GCCcore-10.3.0 --with-ofi=/scratch/ws/1/s3248973-EasyBuild/easybuild-haswell/software/libfabric/1.12.1-GCCcore-10.3.0 --with-pmix=/scratch/ws/1/s3248973-EasyBuild/easybuild-haswell/software/PMIx/3.2.3-GCCcore-10.3.0 --with-ucx=/scratch/ws/1/s3248973-EasyBuild/easybuild-haswell/software/UCX/1.10.0-GCCcore-10.3.0 --without-verbs
    --> Works

@boegel boegel changed the title Fix verbs handling in OMPI correctly check whether --with-ucx is used as OpenMPI configure option (taking into account --with-ucx=no) Jul 2, 2021
@boegel boegel added the bug fix label Jul 2, 2021
@boegel boegel added this to the next release (4.4.1) milestone Jul 2, 2021
Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Copy Markdown
Member

boegel commented Jul 2, 2021

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS OpenMPI-4.1.1-GCC-10.3.0.eb
  • SUCCESS OpenMPI-4.0.1-GCC-8.3.0-2.32.eb

Build succeeded for 2 out of 2 (2 easyconfigs in total)
node2669.swalot.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/5440588d27a8588f80cc9b2c6aa6fcbe for a full test report.

@boegel boegel merged commit fadc344 into easybuilders:develop Jul 2, 2021
@boegel
Copy link
Copy Markdown
Member

boegel commented Jul 2, 2021

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS OpenMPI-4.1.1-GCC-10.3.0.eb
  • SUCCESS OpenMPI-4.0.1-GCC-8.3.0-2.32.eb

Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3599.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/cd58fa821d9d328e574a889d72e1b354 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants