Install dependencies following readme.md
export CONFIG_NAME=egs/datasets/audio/lj/ds.yaml
export MY_EXP_NAME=ds_expPrepare dataset following prepare_data.md
Prepare vocoder following prepare_vocoder.md
First, you need a pre-trained FastSpeech2 checkpoint in checkpoints/aux_exp. You can use the pre-trained model, or train FastSpeech2 from scratch, run:
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/datasets/audio/lj/fs2_orig.yaml --exp_name aux_exp --resetThen, to train DiffSpeech, run:
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config $CONFIG_NAME --exp_name $MY_EXP_NAME --resetYou can check the training and validation curves open Tensorboard via:
tensorboard --logdir checkpoints/$MY_EXP_NAMECUDA_VISIBLE_DEVICES=0 python tasks/run.py --config $CONFIG_NAME --exp_name $MY_EXP_NAME --inferDownload checkpoints from https://github.com/NATSpeech/NATSpeech/releases/download/pretrained_models/ds_exp.zip and unzip it to checkpoints/ds_exp. Then you can directly run inference command:
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --exp_name ds_exp --inferIf you find this useful for your research, please use the following.
@article{liu2021diffsinger,
title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
journal={arXiv preprint arXiv:2105.02446},
volume={2},
year={2021}
}