This is the offcial github page of the AAAI 2023 proceeding paper: "Zero-shot Face-based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-world Scenario".
The demo website: https://sites.google.com/view/spfacevc-demo
- Download your data (with face and speech).
- Change wav input and output path in
preprocess/config/preprocess.yamland generate data with command:
python3 preprocess/preprocess.py
- Change
rootDirandtargetDirinmake_faceemb.py, and execute to get face embedding. Then, do the arithmatic mean for the embeddings (change your input and output dir path as well) with commands:
python3 make_faceemb.py
python3 make_spk_mean.py
- Change the directory path to your own path in
data_loader.py. - Train the model until the loss function is converged with command:
python3 main_gan.py --model_id $your_id$
- Change the parameters in the file to your checkpoint and data, then generate results with command:
python3 conversion_speechbrain.py
- Synthesized results with WAVEGLOW pretrained model and change the inference.py to our file, running with command:
python3 inference.py -f $your_result_path$ -w $waveglow_checkpoint_path$ -o $output_dir$ --is_fp16 -s 0.6
- If you need a checkpoint, here is a reference one. Please read Readme.txt in the following link. Thanks.