本仓库包含了以下工作的官方PyTorch实现:
- PortaSpeech: Portable and High-Quality Generative Text-to-Speech (NeurIPS 2021)
Demo页面 | HuggingFace🤗 Demo - DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (DiffSpeech) (AAAI 2022)
Demo页面 | 项目主页 | HuggingFace🤗 Demo
我们在本框架中实现了以下特点:
- 基于Montreal Forced Aligner的非自回归语音合成数据处理流程;
- 便于使用和可扩展的训练和测试框架;
- 简单但有效的随机访问数据集类的实现。
## 在 Linux/Ubuntu 18.04 上通过测试
## 首先需要安装 Python 3.6+ (推荐使用Anaconda)
export PYTHONPATH=.
# 创建虚拟环境 (推荐).
python -m venv venv
source venv/bin/activate
# 安装依赖
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # 推荐 torch >= 1.9.0
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # 安装强制对齐工具如果本REPO对你的研究和工作有用,请引用以下论文:
- PortaSpeech
@article{ren2021portaspeech,
title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}- DiffSpeech
@article{liu2021diffsinger,
title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
journal={arXiv preprint arXiv:2105.02446},
volume={2},
year={2021}
}我们的代码受以下代码和仓库启发:
