xhtian/project-CURRENNT-scripts
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
########################################################################### ## prodject-CURRENNT-scrits ------------------------------------------- # ## --------------------------------------------------------------------- # ## # ## Copyright (c) 2018 National Institute of Informatics # ## # ## THE NATIONAL INSTITUTE OF INFORMATICS AND THE CONTRIBUTORS TO THIS # ## WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING # ## ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT # ## SHALL THE NATIONAL INSTITUTE OF INFORMATICS NOR THE CONTRIBUTORS # ## BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY # ## DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, # ## WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS # ## ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE # ## OF THIS SOFTWARE. # ########################################################################### ## Author: Xin Wang # ## Date: 2016 - 2018 # ## Contact: wangxin at nii.ac.jp # ########################################################################### This repository contains the scripts to use CURRENNT \- waveform-modeling: scripts for waveform models \- acoustic-modeling: scripts for acoustic models Please read the intruction below ----------------- 1. waveform models -------------------- ./waveform-modeling - DATA Directory to store the data for model training (validation will be selected automatically from these data) - TESTDATA Directory to store the data for test - TESTDATA-for-pretrained Directory to store the data for test for the project-WaveNet-pretrained and project-NSF-pretrained - SCRIPTS Scripts of the training/generation processes - project-NSF-v2-pretrained Simplified and harmonic-plu-noise NSFs trained on SLT. This is a demonstration script to generate waveforms from a trained NSF model. These new NSF models are explained here: https://nii-yamagishilab.github.io/samples-nsf/nsf-v2.html - project-NSF-pretrained Neural source-filter model trained on SLT. This is a demonstration script to generate waveforms from a trained NSF model. The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v1.html Note: due to historical reason, meanstd.bin for these pre-trained NSF models were not calculated using public-CURRENNT-scripts. Models project-NSF-pretrained can only use project-NSF-pretrained/meanstd.bin - project-NSF-scripts Scripts to train NSF on CMU-arctic SLT voice. - project-WaveNet-pretrained Wavenet trained on SLT. This is a demonstration script to generate waveforms from a trained WaveNet model. The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v1.html - project-WaveNet-scripts Scripts to train WaveNet on CMU-arctic SLT voice. Usage: 1. For quick check $: source ./init.sh $: cd waveform-modeling/project-NSF-pretrained/ $: run 01_gen.sh Waveforms should be generated in ./waveform-modeling/project-NSF-pretrained/MODELS/NSF/output 2. For model training using the provided sample data $: source ./init.sh $: cd waveform-modeling/project-NSF/ $: run 00_run.sh After which you can get a trained model in ./waveform-modeling/project-NSF/MODELS/NSF/trained_network.jsn $: run 01_gen.sh After which you can get some waveforms in ./waveform-modeling/project-NSF/MODELS/NSF/output 00_run.sh and 01_gen.sh are only for demonstration. Please don't expect good output since the model is only trained using less than 10 utterances. To train a good model, you may need to use the whole data from one speaker of the CMU-arctic corpus. 3. For training using your own data (or CMU-arctic data): 1. Put waveforms and acoustic features in ./DATA, which stores the training data (validation data will be automatically selected from ./DATA) 2. Read and configure config.py in project-NSF or project-Wavenet 3. Run 00_run.sh 4. Put test data in ./TESTDATA, configure config.py and Run 01_gen.sh ----------------- 2. acoustic models -------------------- ./acoustic-modeling To be updated ----------------------------------------------------------- Note: 1. All the feature files except waveforms should be saved as binary, float32, little-endian. You may check the data included in waveform-modeling/TESTDATA-for-pretrained/mfbsp/*. $: source ./init.sh $: python >>> from ioTools import readwrite >>> mel = readwrite.read_raw_mat('./waveform-modeling/TESTDATA-for-pretrained/mfbsp/arctic_a0001.mfbsp', 80) >>> mel.shape (671, 80) >>> mel[0] array([-0.5804557 , 0.64407444, 0.95468473, 0.9076864 , 0.7409921 , 0.5190042 , 0.2543492 , 0.07272982, -0.02690054, -0.08740515, -0.14761803, -0.24591875, -0.40614623, -0.49193513, -0.69396436, -0.6916913 , -0.67114395, -0.7134891 , -0.8428167 , -0.9381612 , -0.9604182 , -1.020919 , -1.0435845 , -1.077764 , -1.1341914 , -1.2459651 , -1.2637303 , -1.3517176 , -1.3730341 , -1.4908298 , -1.5803422 , -1.6830492 , -1.495914 , -1.3974593 , -1.4675162 , -1.5840678 , -1.6725454 , -1.584573 , -1.7740629 , -2.0816884 , -1.8013108 , -1.7561418 , -2.1837492 , -2.3368633 , -2.0934896 , -2.2621613 , -2.200103 , -2.3064234 , -1.9597349 , -2.2220097 , -2.296505 , -2.0561726 , -2.4601874 , -1.997487 , -2.0367327 , -2.2498186 , -2.8739617 , -2.859662 , -2.680149 , -3.1865113 , -3.172631 , -2.8548572 , -3.0105433 , -2.7395592 , -2.7438028 , -2.625576 , -2.8859007 , -2.8262408 , -2.6442852 , -2.847031 , -2.9952297 , -2.9672284 , -2.6831682 , -3.2138064 , -3.3470006 , -3.4002392 , -2.8704414 , -2.958755 , -3.2552214 , -3.5245233 ], dtype=float32) >>> mel[0,0] -0.5804557 >>> f0 = readwrite.read_raw_mat('./waveform-modeling/TESTDATA-for-pretrained/f0/arctic_a0001.f0', 1) >>> f0.shape (671,) Here, the binary mel-spectrogram is a matrix with 671 frames and 80 dimenions/frame. The F0 is one-dimensional vector with 671 frames. Notice that in physical memory, one datum in a two dimensional data matrix (e.g., mel-spectrom) is accessed through DataArray[D * n + d], where D is the feature dimension, n is the frame index, and d is the dimension index within one frame.