-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Global Stats Normalization Fails with DDP #2953
Description
Describe the bug
I'm encountering a RuntimeError: No backend type associated with device type cpu when fine-tuning a BEST-RQ model using distributed data parallelism (DDP).
This is a new error: The same train script used to work correctly in previous SB versions.
The error occurs during the forward pass when calling self.modules.normalize(...), which internally triggers a call to ddp_all_reduce(...) in speechbrain/processing/features.py
Expected behaviour
For the train to work as it used to.
To Reproduce
A fine-tuning script for BEST-RQ using torchrun (even with a single GPU).
Environment Details
SpeechBrain version: dev branch (latest as of July 2025)
PyTorch version: 2.1.2
Python version: 3.11
Platform: Jean-Zay HPC
Cluster setup: SLURM, 1 node, 2x H100 GPUs
Backend: NCCL
Code is running under SLURM using torchrun
Relevant Log Output
Traceback (most recent call last):
File ".../train_bestrq.py", line 372, in <module>
did_brain.fit(
File ".../speechbrain/core.py", line 1585, in fit
self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
File ".../speechbrain/core.py", line 1410, in _fit_train
loss = self.fit_batch(batch)
File ".../speechbrain/core.py", line 1209, in fit_batch
outputs = self.compute_forward(batch, sb.Stage.TRAIN)
File ".../train_bestrq.py", line 29, in compute_forward
feats = self.modules.normalize(feats, wav_lens)
File ".../torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ".../torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File ".../speechbrain/processing/features.py", line 1430, in forward
self._update_global_stats(x, mask)
File ".../speechbrain/processing/features.py", line 1468, in _update_global_stats
self.count, self.glob_mean, self.glob_std = mean_std_update(
File ".../speechbrain/processing/features.py", line 1250, in mean_std_update
new_statistics = combine_gaussian_statistics_distributed(
File ".../speechbrain/processing/features.py", line 1176, in combine_gaussian_statistics_distributed
global_count = ddp_all_reduce(torch.tensor(local_count), ReduceOp.SUM)
File ".../speechbrain/utils/distributed.py", line 254, in ddp_all_reduce
torch.distributed.all_reduce(communication_object, op=reduce_op)
File ".../torch/distributed/c10d_logger.py", line 81, in wrapper
return func(*args, **kwargs)
File ".../torch/distributed/distributed_c10d.py", line 2810, in all_reduce
work = group.allreduce([tensor], opts)
RuntimeError: No backend type associated with device type cpuAdditional Context
This might be related to this issue.
I will try the proposed fix and update this issue if fixes it.