Skip to content

add support for using Slurm as backend for --job#2642

Merged
akesandgren merged 12 commits intoeasybuilders:developfrom
boegel:pyslurm
Nov 4, 2018
Merged

add support for using Slurm as backend for --job#2642
akesandgren merged 12 commits intoeasybuilders:developfrom
boegel:pyslurm

Conversation

@boegel
Copy link
Copy Markdown
Member

@boegel boegel commented Oct 29, 2018

With this in place, you configure EasyBuild to submit jobs to a SLURM cluster (using --job-backend=PySlurm), as long as a sufficiently recent version of PySlurm is installed (https://pypi.org/project/pyslurm/).

@boegel boegel added this to the 3.8.0 milestone Oct 29, 2018
@boegel boegel requested a review from akesandgren October 29, 2018 09:40
Copy link
Copy Markdown

@houndci-bot houndci-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files could not be reviewed due to errors:

Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/linters/.local/bin/flake8", line 7, in 
    from flake8.main.cli import main
ModuleNotFoundError: No module named 'flake8'

Copy link
Copy Markdown

@houndci-bot houndci-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files could not be reviewed due to errors:

Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/linters/.local/bin/flake8", line 7, in 
    from flake8.main.cli import main
ModuleNotFoundError: No module named 'flake8'

Comment thread easybuild/tools/job/pyslurm_backend.py Outdated
Comment thread easybuild/tools/job/pyslurm_backend.py Outdated
Comment thread easybuild/tools/job/pyslurm_backend.py Outdated
Comment thread easybuild/tools/job/pyslurm_backend.py Outdated
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 31, 2018

@akesandgren Via --job-deps-type, you can now specify what type of dependency must be used.

The default depends on the job backend (to retain backward compatibility): --job-deps-type=always_run is the default for PbsPython (equivalent with using afterany deps), while --jobs-deps-type=abort_on_error is the default for both GC3Pie and the new PySlurm job backend.

akesandgren
akesandgren previously approved these changes Oct 31, 2018
Copy link
Copy Markdown
Contributor

@akesandgren akesandgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, should we go ahead and merge this now or do you think there is anything more to do?

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 31, 2018

@akesandgren I hope to add some tests for the PySlurm backend as well...

Any chance that you can give this a spin yourself to see if --job-backend=PySlurm --job works for you?

@akesandgren
Copy link
Copy Markdown
Contributor

create_job in parallelbuild sets up easybuild_vars to contain only
PYTHONPATH, MODULEPATH and all variables starting with EASYBUILD
That won't fly... it should not touch the environment at all really (in my opinion at least).
One needs to be able to give it all sorts of env, like SBATCH_ACCOUNT, and a bunch of other slurm realted things, and other EB unrelated things.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Oct 31, 2018

@akesandgren That's orthogonal to this PR I think, see the discussion in #2632

@akesandgren
Copy link
Copy Markdown
Contributor

akesandgren commented Oct 31, 2018

There is something wrong.
--job-max-walltime=3 doesn't get passed down into pyslurm in the way it wants.
I get this back (when bypassing the account problem)
python2: error: must supply a time (-t ..)

SlurmJob was passing the wrong parameter to pyslurm. It expects 'time_limit' and not 'time' and it expects an integer in minutes.
I'll try to whip up a PR of the things i can figure out.

akesandgren and others added 2 commits October 31, 2018 18:46
And the value is an integer of minutes.
pyslurm expects time_limit, not 'time'.
@boegel boegel changed the title add support for using PySlurm as backend for --job add support for using PySlurm as backend for --job (WIP) Nov 1, 2018
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 1, 2018

@akesandgren I've just pushed support for using Slurm as a job backend, which doesn't require PySlurm at all; it basically just calls sbatch rather than the equivalent method provided by PySlurm.

I'm wondering this is a better way forward: doesn't require having PySlurm as a dependency, and it probably integrates better with site policies since it doesn't talk directly to the SLURM API...

Comment thread test/framework/parallelbuild.py Outdated
Comment thread easybuild/tools/job/slurm.py
@boegel boegel changed the title add support for using PySlurm as backend for --job (WIP) add support for using Slurm as backend for --job Nov 1, 2018
@akesandgren
Copy link
Copy Markdown
Contributor

akesandgren commented Nov 1, 2018

There needs to be a config option for choosing what to put in "#!/bin/sh" of the submit script. We require /bin/bash for things to work. But it seems that is done by sbatch itself...
Might be better to write an actual submit file to use instead of wrap....

Not sure that is the problem i ran into now or not though.

Nah, that wasn't the poroblem. Ignore this comment.

Comment thread easybuild/tools/job/slurm.py
@akesandgren
Copy link
Copy Markdown
Contributor

With my latest PR to this one I can get stuff running.
(Although it failed since i was doing --from-pr and the submitted jobs of course didn't have my OAuth token ...)

Will try something local later this evening.

Comment thread easybuild/tools/job/slurm.py Outdated
Comment thread test/framework/parallelbuild.py
Comment thread test/framework/parallelbuild.py
@akesandgren
Copy link
Copy Markdown
Contributor

@boegel Anything else you plan on doing with this?

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 3, 2018

@akesandgren @migueldiascosta is testing this too, doesn't hurt to get some more feedback on it.

But it's good to go imho.

@akesandgren akesandgren merged commit 5040647 into easybuilders:develop Nov 4, 2018
@boegel boegel deleted the pyslurm branch November 4, 2018 15:04
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 4, 2018

@akesandgren docs update @ easybuilders/easybuild#467

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants