Conversation
…n/speechbrain into speechllm_librispeech
|
Hi guys @pplantinga @TParcollet @mravanelli , I think the PR is now ready. I went through the comments, and added a tutorial for the caching feature. Also, I improved the ASR SpeechLLM baseline and now gets more competitive results (2.72% on LS test-clean and 5.34% on test-other). I also created the model card and will upload the required files so that we can display this example for the community. This is based on Llama 3.2 1B with LORA adapters and WavLM-Large. Happy to consider other remarks/requests. :) |
pplantinga
left a comment
There was a problem hiding this comment.
Looks like its getting close, thanks Adel! Just a few small comments before merge.
There was a problem hiding this comment.
I get that sometimes a class may be needed and sometimes a function, but torch has both a class torch.nn.GELU and a function torch.nn.functional.gelu so this shouldn't be needed here right?
There was a problem hiding this comment.
If you need the default argument changed you can always use "name" to change it, e.g.
activation: !name:torch.nn.GELU {approximate: tanh}Which returns a constructor that will use tanh by default.
| self.raw_modules = ( | ||
| self.modules.module | ||
| if hasattr(self.modules, "module") | ||
| else self.modules | ||
| ) | ||
|
|
There was a problem hiding this comment.
I'm not sure about this... is this handling the extra layer from DDP? I'm wondering if this should just be handled by the recipe or if we do need a more general solution whether we need something more robust here somehow, like a function you can call to get the modules appropriately.
There was a problem hiding this comment.
The point is to exactly remove the DDP extra layering (similar to here: https://github.com/karpathy/nanoGPT/blob/3adf61e154c3fe3fca428ad6bc3818b27a3b8291/train.py#L253). That way, we can systematically access to methods e.g. get_input_embedding which wouldn't be accessible easily (e.g. here: self.raw_modules.get_input_embedding) while before we would have to do something like x = self.modules.module.get_input_embedding if 'module' in self.modules else ... so you would still need to unwrap the DDP container to get target module for calling the function. Here, we just have a pointer that does that instead of doing it manually. I have seen some recipes that had to do a ugly 'if' and I think this would solve the problem.
having a function that unwrap would work but I think this is simpler and more modular. Happy to consider something else if you think this is not the way to go.
There was a problem hiding this comment.
Well, I'm happy to go with this for now if needed, but maybe the longer term solution is to actually store the DDP module in a separate container. I mean for this part (simplified):
for name, module in self.modules.items():
if any(p.requires_grad for p in module.parameters()):
module = SyncBatchNorm.convert_sync_batchnorm(module)
ddp_module = DDP(
module,
device_ids=[self.device],
find_unused_parameters=self.find_unused_parameters,
)
self.modules[name] = ddp_module
change the last line to self.ddp_modules[name] = ddp_module so we don't overwrite the old one.
I think I addressed all the comments! Fixed the notebook and HDF5 (we were never passing the |
What does this PR do?
This PR adds support of SpeechLLM for ASR with LibriSpeech. Feats extractions, Training, Greedy search, and inference scripts are provided.
Before submitting
PR review
Reviewer checklist