Global Stats Normalization Fails with DDP

### Describe the bug

I'm encountering a `RuntimeError: No backend type associated with device type cpu` when fine-tuning a BEST-RQ model using distributed data parallelism (DDP).

This is a new error: The same train script used to work correctly in previous SB versions.

The error occurs during the forward pass when calling `self.modules.normalize(...)`, which internally triggers a call to `ddp_all_reduce(...)` in `speechbrain/processing/features.py`

### Expected behaviour

For the train to work as it used to.

### To Reproduce

A fine-tuning script for BEST-RQ using `torchrun` (even with a single GPU).


### Environment Details

SpeechBrain version: dev branch (latest as of July 2025)

PyTorch version: 2.1.2

Python version: 3.11

Platform: Jean-Zay HPC

Cluster setup: SLURM, 1 node, 2x H100 GPUs

Backend: NCCL

Code is running under SLURM using torchrun

### Relevant Log Output

```shell
Traceback (most recent call last):
  File ".../train_bestrq.py", line 372, in <module>
    did_brain.fit(
  File ".../speechbrain/core.py", line 1585, in fit
    self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
  File ".../speechbrain/core.py", line 1410, in _fit_train
    loss = self.fit_batch(batch)
  File ".../speechbrain/core.py", line 1209, in fit_batch
    outputs = self.compute_forward(batch, sb.Stage.TRAIN)
  File ".../train_bestrq.py", line 29, in compute_forward
    feats = self.modules.normalize(feats, wav_lens)
  File ".../torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".../torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File ".../speechbrain/processing/features.py", line 1430, in forward
    self._update_global_stats(x, mask)
  File ".../speechbrain/processing/features.py", line 1468, in _update_global_stats
    self.count, self.glob_mean, self.glob_std = mean_std_update(
  File ".../speechbrain/processing/features.py", line 1250, in mean_std_update
    new_statistics = combine_gaussian_statistics_distributed(
  File ".../speechbrain/processing/features.py", line 1176, in combine_gaussian_statistics_distributed
    global_count = ddp_all_reduce(torch.tensor(local_count), ReduceOp.SUM)
  File ".../speechbrain/utils/distributed.py", line 254, in ddp_all_reduce
    torch.distributed.all_reduce(communication_object, op=reduce_op)
  File ".../torch/distributed/c10d_logger.py", line 81, in wrapper
    return func(*args, **kwargs)
  File ".../torch/distributed/distributed_c10d.py", line 2810, in all_reduce
    work = group.allreduce([tensor], opts)
RuntimeError: No backend type associated with device type cpu
```

### Additional Context

This might be related to this [issue](https://github.com/speechbrain/speechbrain/pull/2952). 
I will try the proposed fix and update this issue if fixes it.
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global Stats Normalization Fails with DDP #2953

Describe the bug

Expected behaviour

To Reproduce

Environment Details

Relevant Log Output

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Global Stats Normalization Fails with DDP #2953

Description

Describe the bug

Expected behaviour

To Reproduce

Environment Details

Relevant Log Output

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions