Skip to content

Global Stats Normalization Fails with DDP #2953

@helleuch

Description

@helleuch

Describe the bug

I'm encountering a RuntimeError: No backend type associated with device type cpu when fine-tuning a BEST-RQ model using distributed data parallelism (DDP).

This is a new error: The same train script used to work correctly in previous SB versions.

The error occurs during the forward pass when calling self.modules.normalize(...), which internally triggers a call to ddp_all_reduce(...) in speechbrain/processing/features.py

Expected behaviour

For the train to work as it used to.

To Reproduce

A fine-tuning script for BEST-RQ using torchrun (even with a single GPU).

Environment Details

SpeechBrain version: dev branch (latest as of July 2025)

PyTorch version: 2.1.2

Python version: 3.11

Platform: Jean-Zay HPC

Cluster setup: SLURM, 1 node, 2x H100 GPUs

Backend: NCCL

Code is running under SLURM using torchrun

Relevant Log Output

Traceback (most recent call last):
  File ".../train_bestrq.py", line 372, in <module>
    did_brain.fit(
  File ".../speechbrain/core.py", line 1585, in fit
    self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
  File ".../speechbrain/core.py", line 1410, in _fit_train
    loss = self.fit_batch(batch)
  File ".../speechbrain/core.py", line 1209, in fit_batch
    outputs = self.compute_forward(batch, sb.Stage.TRAIN)
  File ".../train_bestrq.py", line 29, in compute_forward
    feats = self.modules.normalize(feats, wav_lens)
  File ".../torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".../torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File ".../speechbrain/processing/features.py", line 1430, in forward
    self._update_global_stats(x, mask)
  File ".../speechbrain/processing/features.py", line 1468, in _update_global_stats
    self.count, self.glob_mean, self.glob_std = mean_std_update(
  File ".../speechbrain/processing/features.py", line 1250, in mean_std_update
    new_statistics = combine_gaussian_statistics_distributed(
  File ".../speechbrain/processing/features.py", line 1176, in combine_gaussian_statistics_distributed
    global_count = ddp_all_reduce(torch.tensor(local_count), ReduceOp.SUM)
  File ".../speechbrain/utils/distributed.py", line 254, in ddp_all_reduce
    torch.distributed.all_reduce(communication_object, op=reduce_op)
  File ".../torch/distributed/c10d_logger.py", line 81, in wrapper
    return func(*args, **kwargs)
  File ".../torch/distributed/distributed_c10d.py", line 2810, in all_reduce
    work = group.allreduce([tensor], opts)
RuntimeError: No backend type associated with device type cpu

Additional Context

This might be related to this issue.
I will try the proposed fix and update this issue if fixes it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions