avoid test failures for netCDF with iimpi toolchain, by setting $I_MPI_HYDRA_BOOTSTRAP to ssh#24735
Conversation
|
Test report by @jfgrimm edit: only one failed (locks): |
|
@jfgrimm Existing locks |
|
Test report by @jfgrimm |
|
@boegelbot: please test @ jsc-zen3 |
|
@jfgrimm: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3607809364 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
@jfgrimm One test failure which looks weird:
Could be a temporary OS issue |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Still running into the timeouts with: Seems like an issue with HDF5, see #15959 (comment) Also only on our ROME cluster. Running out of ideas |
|
Test report by @Flamefire |
|
I see the hanging with this simple MPI program: #include <stdio.h>
#include <mpi.h>
#define PRINT(s) fprintf(stderr, "[%d] %s", rank, s);
int main(int argc, char **argv) {
int rank = -1, res;
MPI_File fh;
PRINT("Init...\n");
MPI_Init(&argc, &argv);
PRINT("Rank...\n");
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
PRINT("Create...\n");
res = MPI_File_open(MPI_COMM_WORLD, "test_file.h5", MPI_MODE_RDWR | MPI_MODE_CREATE, MPI_INFO_NULL, &fh);
if(res != MPI_SUCCESS) { PRINT("ERROR\n"); return 1; }
PRINT("Closing...\n");
res = MPI_File_close(&fh);
if(res != MPI_SUCCESS) { PRINT("ERROR\n"); return 1; }
PRINT("SUCCESS\n");
MPI_Finalize();
return 0;
}But only with impi/2022a and impi/2022b |
|
Test report by @Flamefire |
|
Even with a generic "Tuning" file set it times out in another test although most now succeed. It hangs in IMO this PR can be merged as it improves things and has no downsides I can see. |
|
@boegelbot: please test @ jsc-zen3 |
|
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 3678577760 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
iimpi toolchain, by setting $I_MPI_HYDRA_BOOTSTRAP to ssh
|
Even with a generic "Tuning" file set it times out in another test although most now succeed. It hangs in IMO this PR can be merged as it improves things and has no downsides I can see. |
|
@Flamefire conflict resolution needed |
|
Thanks, rebased |
|
@boegelbot: please test @ jsc-zen3 |
|
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 4208701324 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
Going in, thanks @Flamefire! |
Copied from cb01b35
This happens when running EasyBuild inside a SLURM job which causes
I_MPI_HYDRA_BOOTSTRAPandHYDRA_BOOTSTRAPto be set and also auto-detected bympirunwhenSLURM_JOBIDis set