Skip to content

Qoboty/espnet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

142 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ESPnet: end-to-end speech processing toolkit

Build Status

ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.

Installation

Install Kaldi, Python libraries and other required tools using system python and virtualenv

$ cd tools
$ make -j

or using local miniconda

$ cd tools
$ make -f conda.mk -j

To use cuda (and cudnn), make sure to set paths in your .bashrc or .bash_profile appropriately.

CUDAROOT=/path/to/cuda

export PATH=$CUDAROOT/bin:$PATH
export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT

Execution of example scripts

Move to an example directory under the egs directory. We prepare several major ASR benchmarks including WSJ, CHiME-4, and TED. The following directory is an example of performing ASR experiment with the VoxForge Italian Corpus.

$ cd egs/voxforge/asr1

Once move to the directory, then, execute the following main script with a chainer backend:

$ ./run.sh

or execute the following main script with a pytorch backend (currently the pytorch backend does not support VGG-like layers):

$ ./run.sh --backend pytorch --etype blstmp

With this main script, you can perform a full procedure of ASR experiments including

Use of GPU

If you use GPU in your experiment, set --gpu option in run.sh appropriately, e.g.,

$ ./run.sh --gpu 0

Default setup uses CPU (--gpu -1).

Setup in your cluster

Change cmd.sh according to your cluster setup. If you run experiments with your local machine, please use default cmd.sh. For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html. It supports Grid Engine (queue.pl), SLURM (slurm.pl), etc.

Error due to matplotlib

If you have the following error (or other numpy related errors),

RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
Exception in main training loop: numpy.core.multiarray failed to import
Traceback (most recent call last):
;
:
from . import _path, rcParams
ImportError: numpy.core.multiarray failed to import

Then, please reinstall matplotlib with the following command:

$ cd egs/voxforge/asr1
$ . ./path.sh
$ pip install pip --upgrade; pip uninstall matplotlib; pip --no-cache-dir install matplotlib

Installation using Docker

For GPU support nvidia-docker should be installed.

For Execution use the command

$ cd egs/voxforge/asr1
$ ./run_in_docker.sh --gpu GPUID

If GPUID is set to -1, the program will run only CPU.

The file builds and loads the information into the Docker container. If any additional application is required, modify the Docker devel-file located at the tools folder.

To downgrade or use a private devel file, modify the name inside run_in_docker.sh

References (Please cite the following articles)

[1] Suyoun Kim, Takaaki Hori, and Shinji Watanabe, "Joint CTC-attention based end-to-end speech recognition using multi-task learning," Proc. ICASSP'17, pp. 4835--4839 (2017)

[2] Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey and Tomoki Hayashi, "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition," IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240-1253, Dec. 2017

About

End-to-End Speech Processing Toolkit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Shell 46.7%
  • Python 44.5%
  • Perl 8.3%
  • Prolog 0.5%