Skip to content

on slurm, some pipeline jobs fail due to memory issue whereas enough is attributed #862

@cokelaer

Description

@cokelaer

Please fill this information if known:

Error on slurm due to exceed memory whereas enough is provided.

Exemple of error message:

Job needs mem_mib=6430 but only mem_mib=4591 are available. This is likely because 
two jobs are connected via a pipe or a service output and have to run simultaneously. Consider 
providing more resources (e.g. via --cores).

Here, I attributed 32G so why do we have only 4.5Gb attributed in mem_mib ?

Maybe a snakemake issue related to snakemake/snakemake#2774

what you think might have happened

in snakemake.rules, the resources dictionary is populated from the users fields. In sequana pipelines, we use the mem field for memory. Another option is possible with snakemake with the mem_gb field.

Internally, a third option is populated by snakemake: mem_mib. We can play with this example:

rule all:
    input: ['test1.out']

rule  a:
    output:
       "test1.out",
    resources:
        mem="5G",
shell:
    """
    echo {resources.mem_mb} {resources.mem_mib}  {resources} > {output}
"""

and test it with

snakemake -s Snakefile -c1 -p --forceall --cluster "sbatch " -j1 -p

you will see snakemake output with this line

resources: mem_mb=5000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, mem=5G

clearly this is wrong, mem_mib has only 1G (this is the default in snakemake)

so what is happening here in snakemake.rules is that the resources are stored as a dictionary. Here it contains mem and populated mem_mb with default of 1G.

Then, in snakemake.rules.expand_resources, we use the resources dictionary and populate mem_mib. Since it is a dictionary, you first deal with mem (that updates mem_mb) or mem_mb that updates mem_mib. So sometimes, it works, sometimes it does not...

FIX: add these 2 lines in snakemake.rules.expand_resources function

if "mem" in self.resources:
    infer_resources("mem", self.resources['mem'], self.resources)

for name, res in list(self.resources.items()):
    if name != "_cores":
        value = apply(name, res, threads=threads)

should do a PR on snakemake github.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions