Fix issue 'GlobalNorm with DDP' by svecjan · Pull Request #2934 · speechbrain/speechbrain

svecjan · 2025-06-05T09:33:50Z

What does this PR do?

This pull request fixes an issue with training in DDP mode when using GlobalNorm.

Error :

File "sb1/recipes/LibriSpeech/self-supervised-learning/BEST-RQ/train.py", line 355, in <module>
    main()
  File "sb1/recipes/LibriSpeech/self-supervised-learning/BEST-RQ/train.py", line 345, in main
    brain.fit(
  File "sb1/speechbrain/core.py", line 1585, in fit
    self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
  File "sb1/speechbrain/core.py", line 1410, in _fit_train
    loss = self.fit_batch(batch)
           ^^^^^^^^^^^^^^^^^^^^^
  File "sb1/speechbrain/core.py", line 1209, in fit_batch
    outputs = self.compute_forward(batch, sb.Stage.TRAIN)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "sb1/recipes/LibriSpeech/self-supervised-learning/BEST-RQ/train.py", line 54, in compute_forward
    feats = self.modules.normalize(feats, wav_lens, epoch=current_epoch)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/matylda3/isvecjan/miniconda3/envs/sb1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/matylda3/isvecjan/miniconda3/envs/sb1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "sb1/speechbrain/processing/features.py", line 1430, in forward
    self._update_global_stats(x, mask)
  File "sb1/speechbrain/processing/features.py", line 1468, in _update_global_stats
    self.count, self.glob_mean, self.glob_std = mean_std_update(
                                                ^^^^^^^^^^^^^^^^
  File "sb1/speechbrain/processing/features.py", line 1250, in mean_std_update
    new_statistics = combine_gaussian_statistics_distributed(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "sb1/speechbrain/processing/features.py", line 1176, in combine_gaussian_statistics_distributed
    global_count = ddp_all_reduce(torch.tensor(local_count), ReduceOp.SUM)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "sb1/speechbrain/utils/distributed.py", line 254, in ddp_all_reduce
    torch.distributed.all_reduce(communication_object, op=reduce_op)
  File "/mnt/matylda3/isvecjan/miniconda3/envs/sb1/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/matylda3/isvecjan/miniconda3/envs/sb1/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 2810, in all_reduce
    work = group.allreduce([tensor], opts)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: No backend type associated with device type cpu

I encountered this issue with the vanilla recipes:

recipes/LibriSpeech/self-supervised-learning/BEST-RQ/hparams/BEST-RQ.yaml 
(change: normalize.norm_type from "sentence" to "global")

recipes/LibriSpeech/ASR/CTC/hparams/conformer_large.yaml

Adel-Moumen

LGTM! Thanks for catching this bug.

Fix issue 'GlobalNorm with DDP'

6482e17

Adel-Moumen approved these changes Jun 5, 2025

View reviewed changes

TParcollet merged commit c75ab54 into speechbrain:develop Jun 10, 2025
6 of 9 checks passed

svecjan deleted the norm_ddp branch June 13, 2025 09:23

Adel-Moumen mentioned this pull request Jul 21, 2025

Global Stats Normalization Fails with DDP #2953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 'GlobalNorm with DDP'#2934

Fix issue 'GlobalNorm with DDP'#2934
TParcollet merged 1 commit intospeechbrain:developfrom
svecjan:norm_ddp

svecjan commented Jun 5, 2025

Uh oh!

Adel-Moumen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

svecjan commented Jun 5, 2025

What does this PR do?

Uh oh!

Adel-Moumen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants