Skip to content

v0.3.0#803

Merged
erogol merged 41 commits intomainfrom
dev
Sep 13, 2021
Merged

v0.3.0#803
erogol merged 41 commits intomainfrom
dev

Conversation

@erogol
Copy link
Copy Markdown
Member

@erogol erogol commented Sep 13, 2021

🐸 v0.3.0

New ForwardTTS implementation.

This version implements a new ForwardTTS interface that can be configured as any feed-forward TTS model that uses a duration predictor at inference time. Currently, we provide 3 pre-configured models and plan to implement one more.

  1. SpeedySpeech
  2. FastSpeech
  3. FastPitch
  4. FastSpeech 2 (TODO)

Through this API, any model can be trained in two ways. Either using pre-computed durations from a pre-trained Tacotron model or using an alignment network to learn durations from the dataset. The alignment network is only used at training and discarded at inference. You can set which mode you want to use by just setting the use_aligner field in the configuration.

This new API will help us to design more efficient inference run-time for all these models using ONNX like run-time optimizers.

Old FastPitch and SpeedySpeech implementations are deprecated for the sake of this new implementation.

Fine-Tuning Documentation

This version introduces documentation for model fine-tunning. You can see it under https://tts.readthedocs.io/ when this is merged.

SpeedySpeech model using `ForwardTTS`
UnivNet model fine-tuned on TacotronDDC_ph spectrograms
@erogol erogol merged commit 0592a58 into main Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant