Conversation
- Generic API for feed-forward TTS models (FastPitch, SpeedySpeech) - Tests for `forward-tts` - Edit FastPitchConfig and SpeedySpeechConfig to use `forward_tts`
Forward TTS implementation
SpeedySpeech model using `ForwardTTS` UnivNet model fine-tuned on TacotronDDC_ph spectrograms
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🐸 v0.3.0
New
ForwardTTSimplementation.This version implements a new
ForwardTTSinterface that can be configured as any feed-forward TTS model that uses a duration predictor at inference time. Currently, we provide 3 pre-configured models and plan to implement one more.Through this API, any model can be trained in two ways. Either using pre-computed durations from a pre-trained Tacotron model or using an alignment network to learn durations from the dataset. The alignment network is only used at training and discarded at inference. You can set which mode you want to use by just setting the
use_alignerfield in the configuration.This new API will help us to design more efficient inference run-time for all these models using ONNX like run-time optimizers.
Old
FastPitchandSpeedySpeechimplementations are deprecated for the sake of this new implementation.Fine-Tuning Documentation
This version introduces documentation for model fine-tunning. You can see it under https://tts.readthedocs.io/ when this is merged.