Skip to content

Latest commit

 

History

History
107 lines (76 loc) · 4.63 KB

File metadata and controls

107 lines (76 loc) · 4.63 KB

Neural Architectures

.. toctree::
   :hidden:

   nn/using-wav2vec-2.0-hubert-wavlm-and-whisper-from-huggingface-with-speechbrain.ipynb
   nn/neural-network-adapters.ipynb
   nn/complex-and-quaternion-neural-networks.ipynb
   nn/recurrent-neural-networks-and-speechbrain.ipynb
   nn/conformer-streaming-asr.ipynb


🔗 Fine-tuning or using Whisper, wav2vec2, HuBERT and others with SpeechBrain and HuggingFace

Parcollet T. & Moumen A. Dec. 2022 Difficulty: medium Time: 20m 🔗 Google Colab

This tutorial describes how to combine (use and finetune) pretrained models coming from HuggingFace. Any wav2vec 2.0 / HuBERT / WavLM or Whisper model integrated to the transformers interface of HuggingFace can be then plugged to SpeechBrain to approach a speech-related task: automatic speech recognition, speaker recognition, spoken language understanding ...

🔗 Neural Network Adapters for faster low-memory fine-tuning

Plantinga P. Sept. 2024 Difficulty: easy Time: 20m 🔗 Google Colab

This tutorial covers the SpeechBrain implementation of adapters such as LoRA. This includes how to integrate either SpeechBrain implemented adapters, custom adapters, and adapters from libraries such as PEFT into a pre-trained model.

🔗 Complex and Quaternion Neural Networks

Parcollet T. Feb. 2021 Difficulty: medium Time: 30min 🔗 Google Colab

This tutorial demonstrates how to use the SpeechBrain implementation of complex-valued and quaternion-valued neural networks for speech technologies. It covers the basics of highdimensional representations and the associated neural layers : Linear, Convolution, Recurrent and Normalisation.

🔗 Recurrent Neural Networks

Ravanelli M. Feb. 2021 Difficulty: easy Time: 30min 🔗 Google Colab

Recurrent Neural Networks (RNNs) offer a natural way to process sequences. This tutorial demonstrates how to use the SpeechBrain implementations of RNNs including LSTMs, GRU, RNN and LiGRU a specific recurrent cell designed for speech-related tasks. RNNs are at the core of many sequence to sequence models.

🔗 Streaming Speech Recognition with Conformers

de Langen S. Sep. 2024 Difficulty: medium Time: 60min+ 🔗 Google Colab

Automatic Speech Recognition (ASR) models are often only designed to transcribe an entire large chunk of audio and are unsuitable for usecases like live stream transcription, which requires low-latency, long-form transcription.

This tutorial introduces the Dynamic Chunk Training approach and architectural changes you can apply to make the Conformer model streamable. It introduces the tooling for training and inference that SpeechBrain can provide for you. This might be a good starting point if you're interested in training and understanding your own streaming models, or even if you want to explore improved streaming architectures.