GitHub - xhtian/project-CURRENNT-scripts: This repository contains the scripts to use CURRENNT

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
acoustic-modeling		acoustic-modeling
misc/mel-spectrogram-scripts		misc/mel-spectrogram-scripts
waveform-modeling		waveform-modeling
._README		._README
.gitignore		.gitignore
LICENSE		LICENSE
README		README
init.sh		init.sh

Repository files navigation

###########################################################################
##  prodject-CURRENNT-scrits -------------------------------------------  #
## ---------------------------------------------------------------------  #
##                                                                        #
##  Copyright (c) 2018  National Institute of Informatics                 #
##                                                                        #
##  THE NATIONAL INSTITUTE OF INFORMATICS AND THE CONTRIBUTORS TO THIS    #
##  WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING  #
##  ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT    #
##  SHALL THE NATIONAL INSTITUTE OF INFORMATICS NOR THE CONTRIBUTORS      #
##  BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY   #
##  DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,       #
##  WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS        #
##  ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE   #
##  OF THIS SOFTWARE.                                                     #
###########################################################################
##                         Author: Xin Wang                               #
##                         Date:   2016 - 2018                            #
##                         Contact: wangxin at nii.ac.jp                  #
###########################################################################

This repository contains the scripts to use CURRENNT
\- waveform-modeling: scripts for waveform models
\- acoustic-modeling: scripts for acoustic models

Please read the intruction below 

----------------- 1. waveform models --------------------
./waveform-modeling

  - DATA
  Directory to store the data for model training (validation will be selected
  automatically from these data)

  - TESTDATA
  Directory to store the data for test
  
  - TESTDATA-for-pretrained
  Directory to store the data for test for the project-WaveNet-pretrained
  and project-NSF-pretrained

  - SCRIPTS
  Scripts of the training/generation processes

  - project-NSF-v2-pretrained
  Simplified and harmonic-plu-noise NSFs trained on SLT. This is a demonstration
  script to generate waveforms from a trained NSF model.
  These new NSF models are explained here: https://nii-yamagishilab.github.io/samples-nsf/nsf-v2.html


  - project-NSF-pretrained
  Neural source-filter model trained on SLT. This is a demonstration
  script to generate waveforms from a trained NSF model.
  The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v1.html
  
  Note: due to historical reason, meanstd.bin for these pre-trained NSF models were not calculated
  using public-CURRENNT-scripts. Models project-NSF-pretrained can only use project-NSF-pretrained/meanstd.bin
  
  - project-NSF-scripts
  Scripts to train NSF on CMU-arctic SLT voice. 

  - project-WaveNet-pretrained
  Wavenet trained on SLT. This is a demonstration
  script to generate waveforms from a trained WaveNet model.
  The samples have been uploaded to https://nii-yamagishilab.github.io/samples-nsf/nsf-v1.html

  - project-WaveNet-scripts
  Scripts to train WaveNet on CMU-arctic SLT voice. 


Usage:
1. For quick check
   $: source ./init.sh
   $: cd waveform-modeling/project-NSF-pretrained/
   $: run 01_gen.sh

   Waveforms should be generated in ./waveform-modeling/project-NSF-pretrained/MODELS/NSF/output
   
2. For model training using the provided sample data
   
   $: source ./init.sh
   $: cd waveform-modeling/project-NSF/
   $: run 00_run.sh
   
   After which you can get a trained model in ./waveform-modeling/project-NSF/MODELS/NSF/trained_network.jsn

   $: run 01_gen.sh
   After which you can get some waveforms in ./waveform-modeling/project-NSF/MODELS/NSF/output

   00_run.sh and 01_gen.sh are only for demonstration. Please don't expect good output since the model
   is only trained using less than 10 utterances.
   
   To train a good model, you may need to use the whole data from one speaker of the CMU-arctic corpus.

3. For training using your own data (or CMU-arctic data):
   1. Put waveforms and acoustic features in ./DATA, which stores the training data (validation data will be
      automatically selected from ./DATA)
   2. Read and configure config.py in project-NSF or project-Wavenet
   3. Run 00_run.sh
   4. Put test data in ./TESTDATA, configure config.py and Run 01_gen.sh


----------------- 2. acoustic models --------------------
./acoustic-modeling

To be updated



-----------------------------------------------------------


Note:
   1. All the feature files except waveforms should be saved as binary, float32, little-endian.
      You may check the data included in waveform-modeling/TESTDATA-for-pretrained/mfbsp/*.
      $: source ./init.sh
      $: python
      >>> from ioTools import readwrite
      >>> mel = readwrite.read_raw_mat('./waveform-modeling/TESTDATA-for-pretrained/mfbsp/arctic_a0001.mfbsp', 80)
      >>> mel.shape
      (671, 80)
      >>> mel[0]
      array([-0.5804557 ,  0.64407444,  0.95468473,  0.9076864 ,  0.7409921 ,
        0.5190042 ,  0.2543492 ,  0.07272982, -0.02690054, -0.08740515,
       -0.14761803, -0.24591875, -0.40614623, -0.49193513, -0.69396436,
       -0.6916913 , -0.67114395, -0.7134891 , -0.8428167 , -0.9381612 ,
       -0.9604182 , -1.020919  , -1.0435845 , -1.077764  , -1.1341914 ,
       -1.2459651 , -1.2637303 , -1.3517176 , -1.3730341 , -1.4908298 ,
       -1.5803422 , -1.6830492 , -1.495914  , -1.3974593 , -1.4675162 ,
       -1.5840678 , -1.6725454 , -1.584573  , -1.7740629 , -2.0816884 ,
       -1.8013108 , -1.7561418 , -2.1837492 , -2.3368633 , -2.0934896 ,
       -2.2621613 , -2.200103  , -2.3064234 , -1.9597349 , -2.2220097 ,
       -2.296505  , -2.0561726 , -2.4601874 , -1.997487  , -2.0367327 ,
       -2.2498186 , -2.8739617 , -2.859662  , -2.680149  , -3.1865113 ,
       -3.172631  , -2.8548572 , -3.0105433 , -2.7395592 , -2.7438028 ,
       -2.625576  , -2.8859007 , -2.8262408 , -2.6442852 , -2.847031  ,
       -2.9952297 , -2.9672284 , -2.6831682 , -3.2138064 , -3.3470006 ,
       -3.4002392 , -2.8704414 , -2.958755  , -3.2552214 , -3.5245233 ],
      dtype=float32)
      >>> mel[0,0]
      -0.5804557
      >>> f0 = readwrite.read_raw_mat('./waveform-modeling/TESTDATA-for-pretrained/f0/arctic_a0001.f0', 1)
      >>> f0.shape
      (671,)

      Here, the binary mel-spectrogram is a matrix with 671 frames and 80 dimenions/frame.
      The F0 is one-dimensional vector with 671 frames. 
      
      Notice that in physical memory, one datum in a two dimensional data matrix (e.g., mel-spectrom) is 
      accessed through DataArray[D * n + d], where D is the feature dimension, n is the frame index, and d is
      the dimension index within one frame.