SpeechLLM LibriSpeech recipe by Adel-Moumen · Pull Request #2885 · speechbrain/speechbrain

Adel-Moumen · 2025-04-11T11:09:21Z

What does this PR do?

This PR adds support of SpeechLLM for ASR with LibriSpeech. Feats extractions, Training, Greedy search, and inference scripts are provided.

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

…n/speechbrain into speechllm_librispeech

…values

…hanges

recipes/LibriSpeech/ASR/transformer/README.md

Adel-Moumen · 2026-02-03T16:14:47Z

Hi guys @pplantinga @TParcollet @mravanelli , I think the PR is now ready. I went through the comments, and added a tutorial for the caching feature. Also, I improved the ASR SpeechLLM baseline and now gets more competitive results (2.72% on LS test-clean and 5.34% on test-other).

I also created the model card and will upload the required files so that we can display this example for the community. This is based on Llama 3.2 1B with LORA adapters and WavLM-Large.

Happy to consider other remarks/requests.

:)

pplantinga

Looks like its getting close, thanks Adel! Just a few small comments before merge.

pplantinga · 2026-02-06T20:00:25Z

speechbrain/nnet/activations.py

I get that sometimes a class may be needed and sometimes a function, but torch has both a class torch.nn.GELU and a function torch.nn.functional.gelu so this shouldn't be needed here right?

pplantinga · 2026-02-06T20:03:38Z

speechbrain/nnet/activations.py

If you need the default argument changed you can always use "name" to change it, e.g.

activation: !name:torch.nn.GELU {approximate: tanh}

Which returns a constructor that will use tanh by default.

speechbrain/nnet/activations.py

speechbrain/nnet/schedulers.py

pplantinga · 2026-02-06T20:14:12Z

speechbrain/core.py

+        self.raw_modules = (
+            self.modules.module
+            if hasattr(self.modules, "module")
+            else self.modules
+        )
+


I'm not sure about this... is this handling the extra layer from DDP? I'm wondering if this should just be handled by the recipe or if we do need a more general solution whether we need something more robust here somehow, like a function you can call to get the modules appropriately.

The point is to exactly remove the DDP extra layering (similar to here: https://github.com/karpathy/nanoGPT/blob/3adf61e154c3fe3fca428ad6bc3818b27a3b8291/train.py#L253). That way, we can systematically access to methods e.g. get_input_embedding which wouldn't be accessible easily (e.g. here: self.raw_modules.get_input_embedding) while before we would have to do something like x = self.modules.module.get_input_embedding if 'module' in self.modules else ... so you would still need to unwrap the DDP container to get target module for calling the function. Here, we just have a pointer that does that instead of doing it manually. I have seen some recipes that had to do a ugly 'if' and I think this would solve the problem.

having a function that unwrap would work but I think this is simpler and more modular. Happy to consider something else if you think this is not the way to go.

Well, I'm happy to go with this for now if needed, but maybe the longer term solution is to actually store the DDP module in a separate container. I mean for this part (simplified):

for name, module in self.modules.items(): if any(p.requires_grad for p in module.parameters()): module = SyncBatchNorm.convert_sync_batchnorm(module) ddp_module = DDP( module, device_ids=[self.device], find_unused_parameters=self.find_unused_parameters, ) self.modules[name] = ddp_module

change the last line to self.ddp_modules[name] = ddp_module so we don't overwrite the old one.

docs/tutorials/basics/data-loading-pipeline.ipynb

Adel-Moumen · 2026-02-07T12:05:53Z

Looks like its getting close, thanks Adel! Just a few small comments before merge.

I think I addressed all the comments! Fixed the notebook and HDF5 (we were never passing the compression arg to the constructor) + removed the activations + added back authors.

pplantinga

LGTM!

Adel-Moumen and others added 24 commits April 10, 2025 11:44

Attempt to fix self.device for AMP

2a2605b

pre-commit

6b895ac

fix 'cuda' dtype

4b43535

precommit

5014409

MM attention mask + downsampler

9139d55

Attempt to fix self.device for AMP

6ff6407

pre-commit

9d094f6

fix 'cuda' dtype

a21d878

precommit

712e781

MM attention mask + downsampler

073bd14

Merge branch 'speechllm_librispeech' of https://github.com/speechbrai…

c111887

…n/speechbrain into speechllm_librispeech

loss

8f94b17

Merge remote-tracking branch 'origin/develop' into speechllm_librispeech

265f609

extend PaddedBatch to support padding of specific keys with specific …

4beec66

…values

update config naming

a2afaa6

padding tokens are mapped to -100 and skipped from loss

7ef607a

fix tests

e58f98d

prompt + generate fn

4b78227

update yaml

a024f32

update recipe -> WER looks good.

288651d

small updates

46e0685

beamsearch + greedysearch with speechLLMs

c51fc1a

greedy/beam

2168bd1

fix little bug

b62f1c9

TParcollet added this to the v1.1.0 milestone Oct 9, 2025

TParcollet assigned TParcollet and Adel-Moumen Oct 9, 2025

Adel-Moumen added 3 commits October 21, 2025 09:12

update codebase

d5de7e1

Merge remote-tracking branch 'origin/develop' into speechllm_librispeech

8c70ac7

merge FT + offline

2ec9a6a

Adel-Moumen added 15 commits January 29, 2026 12:53

use_feats=True

b6ebdd6

README

5f1b6c9

update path

2e0b1b6

fix doc gen sphinx

f8a8f2e

pre-commit fixes

2d3c0f0

fix empty init

113e221

fix doc gen!!

31345ae

linters

7a1a69f

remove unused proj

a7c3228

fix CI deprecations

103fadd

more fixes

674eb5b

missing dep?

5b4e582

Merge origin/develop into speechllm_librispeech, preferring develop c…

6e4bbbe

…hanges

revert fixes + pre-commit

e503d25

add tutorial for loading features

edc72d4

Adel-Moumen commented Feb 3, 2026

View reviewed changes

recipes/LibriSpeech/ASR/transformer/README.md Show resolved Hide resolved

docstring

ae52b7a

pplantinga reviewed Feb 6, 2026

View reviewed changes

Adel-Moumen added 6 commits February 7, 2026 12:23

remove activations

e969773

add authors

8281663

compression never passed to the create_dataset wow.

5e8bbab

updated notebook

9049ab7

pre-commit fix

a5fc692

fix legacy

97ee31d

pplantinga approved these changes Feb 7, 2026

View reviewed changes

Adel-Moumen merged commit 39ef358 into develop Feb 7, 2026
5 checks passed

Adel-Moumen deleted the speechllm_librispeech branch February 7, 2026 15:08

pplantinga mentioned this pull request Feb 7, 2026

SpeechLLM and Whisper #2663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpeechLLM LibriSpeech recipe#2885

SpeechLLM LibriSpeech recipe#2885
Adel-Moumen merged 89 commits intodevelopfrom
speechllm_librispeech

Adel-Moumen commented Apr 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Adel-Moumen commented Feb 3, 2026

Uh oh!

pplantinga left a comment

Uh oh!

pplantinga Feb 6, 2026

Uh oh!

pplantinga Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

pplantinga Feb 6, 2026

Uh oh!

Adel-Moumen Feb 7, 2026

Uh oh!

pplantinga Feb 7, 2026

Uh oh!

Uh oh!

Adel-Moumen commented Feb 7, 2026

Uh oh!

pplantinga left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Adel-Moumen commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

Uh oh!

Adel-Moumen commented Feb 3, 2026

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

pplantinga Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

pplantinga Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pplantinga Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Adel-Moumen Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

pplantinga Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Adel-Moumen commented Feb 7, 2026

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adel-Moumen commented Apr 11, 2025 •

edited

Loading