espnet/tutorial.html at master · KengoMachida/espnet

366 lines (248 loc) · 14.8 KB
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Outline &mdash; ESPnet 0.4.2 documentation</title>
  <script type="text/javascript" src="_static/js/modernizr.min.js"></script>
      <script type="text/javascript">
          var DOCUMENTATION_OPTIONS = {
              URL_ROOT:'./',
              VERSION:'0.4.2',
              LANGUAGE:'None',
              COLLAPSE_INDEX:false,
              FILE_SUFFIX:'.html',
              HAS_SOURCE:  true,
              SOURCELINK_SUFFIX: '.txt'
      </script>
        <script type="text/javascript" src="_static/jquery.js"></script>
        <script type="text/javascript" src="_static/underscore.js"></script>
        <script type="text/javascript" src="_static/doctools.js"></script>
        <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <script type="text/javascript" src="_static/js/theme.js"></script>
  <link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
  <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Speech Recognition (Recipe)" href="notebook/asr_cli.html" />
    <link rel="prev" title="ESPnet: end-to-end speech processing toolkit" href="index.html" /> 
<body class="wy-body-for-nav">
  <div class="wy-grid-for-nav">
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >
            <a href="index.html" class="icon icon-home"> ESPnet
          </a>
              <div class="version">
              </div>
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
        </div>
        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
              <p class="caption"><span class="caption-text">Tutorial:</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">Outline</a></li>
<li class="toctree-l1"><a class="reference internal" href="#installation">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="#execution-of-example-scripts">Execution of example scripts</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#use-of-gpu">Use of GPU</a></li>
<li class="toctree-l2"><a class="reference internal" href="#setup-in-your-cluster">Setup in your cluster</a></li>
<li class="toctree-l1"><a class="reference internal" href="#demonstration-using-pretrained-models">Demonstration using pretrained models</a></li>
<li class="toctree-l1"><a class="reference internal" href="#installation-using-docker">Installation using Docker</a></li>
<li class="toctree-l1"><a class="reference internal" href="#references">References</a></li>
<p class="caption"><span class="caption-text">Notebook:</span></p>
<li class="toctree-l1"><a class="reference internal" href="notebook/asr_cli.html">Speech Recognition (Recipe)</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebook/asr_library.html">Speech Recognition (Library)</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebook/tts_cli.html">Text-to-Speech (Recipe)</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebook/pretrained.html">Pretrained Model</a></li>
<p class="caption"><span class="caption-text">Package Reference:</span></p>
<li class="toctree-l1"><a class="reference internal" href="_gen/espnet-asr.html">espnet.asr package</a></li>
<li class="toctree-l1"><a class="reference internal" href="_gen/espnet-lm.html">espnet.lm package</a></li>
<li class="toctree-l1"><a class="reference internal" href="_gen/espnet-nets.html">espnet.nets package</a></li>
<li class="toctree-l1"><a class="reference internal" href="_gen/espnet-transform.html">espnet.transform package</a></li>
<li class="toctree-l1"><a class="reference internal" href="_gen/espnet-tts.html">espnet.tts package</a></li>
<li class="toctree-l1"><a class="reference internal" href="_gen/espnet-utils.html">espnet.utils package</a></li>
<p class="caption"><span class="caption-text">Tool Reference:</span></p>
<li class="toctree-l1"><a class="reference internal" href="apis/espnet_bin.html">core tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="apis/utils_py.html">python utility tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="apis/utils_sh.html">bash utility tools</a></li>
        </div>
      </div>
    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
      <nav class="wy-nav-top" aria-label="top navigation">
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="index.html">ESPnet</a>
      </nav>
      <div class="wy-nav-content">
        <div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="index.html">Docs</a> &raquo;</li>
      <li>Outline</li>
      <li class="wy-breadcrumbs-aside">
            <a href="_sources/tutorial.md.txt" rel="nofollow"> View page source</a>
      </li>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
/* CSS overrides for sphinx_rtd_theme */
/* 24px margin */
.nbinput.nblast,
.nboutput.nblast {
    margin-bottom: 19px;  /* padding has already 5px */
/* ... except between code cells! */
.nblast + .nbinput {
    margin-top: -19px;
.admonition > p:before {
    margin-right: 4px;  /* make room for the exclamation icon */
/* Fix math alignment, see https://github.com/rtfd/sphinx_rtd_theme/pull/686 */
    text-align: unset;
<div class="section" id="outline">
<h1>Outline<a class="headerlink" href="#outline" title="Permalink to this headline">¶</a></h1>
<p>ESPnet is an end-to-end speech processing toolkit.
ESPnet uses <a class="reference external" href="https://chainer.org/">chainer</a> as a main deep learning engine,
and also follows <a class="reference external" href="http://kaldi-asr.org/">Kaldi</a> style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.</p>
<div class="section" id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline">¶</a></h1>
<p>Install Kaldi, Python libraries and other required tools</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tools
</pre></div>
<p>To use cuda (and cudnn), make sure to set paths in your <code class="docutils literal notranslate"><span class="pre">.bashrc</span></code> or <code class="docutils literal notranslate"><span class="pre">.bash_profile</span></code> appropriately.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>CUDAROOT=/path/to/cuda
export PATH=$CUDAROOT/bin:$PATH
export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
</pre></div>
<div class="section" id="execution-of-example-scripts">
<h1>Execution of example scripts<a class="headerlink" href="#execution-of-example-scripts" title="Permalink to this headline">¶</a></h1>
<p>Move to an example directory under the <code class="docutils literal notranslate"><span class="pre">egs</span></code> directory.
We prepare several major ASR benchmarks including WSJ, CHiME-4, and TED.
The following directory is an example of performing ASR experiment with the VoxForge Italian Corpus.</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/voxforge/asr1
</pre></div>
<p>Once move to the directory, then, execute the following main script:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>$ ./run.sh
</pre></div>
<p>With this main script, you can perform a full procedure of ASR experiments including</p>
<ul class="simple">
<li>Data download</li>
<li>Data preparation (Kaldi style, see http://kaldi-asr.org/doc/data_prep.html)</li>
<li>Feature extraction (Kaldi style, see http://kaldi-asr.org/doc/feat.html)</li>
<li>Dictionary and JSON format data preparation</li>
<li>Training based on <a class="reference external" href="https://chainer.org/">chainer</a>.</li>
<li>Recognition and scoring</li>
<div class="section" id="use-of-gpu">
<h2>Use of GPU<a class="headerlink" href="#use-of-gpu" title="Permalink to this headline">¶</a></h2>
<p>If you use GPU in your experiment, set <code class="docutils literal notranslate"><span class="pre">--gpu</span></code> option in <code class="docutils literal notranslate"><span class="pre">run.sh</span></code> appropriately, e.g.,</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>$ ./run.sh --gpu <span class="m">0</span>
</pre></div>
<p>Default setup uses CPU (<code class="docutils literal notranslate"><span class="pre">--gpu</span> <span class="pre">-1</span></code>).</p>
<div class="section" id="setup-in-your-cluster">
<h2>Setup in your cluster<a class="headerlink" href="#setup-in-your-cluster" title="Permalink to this headline">¶</a></h2>
<p>Change <code class="docutils literal notranslate"><span class="pre">cmd.sh</span></code> according to your cluster setup.
If you run experiments with your local machine, you don’t have to change it.
For more information about <code class="docutils literal notranslate"><span class="pre">cmd.sh</span></code> see http://kaldi-asr.org/doc/queue.html.
It supports Grid Engine (<code class="docutils literal notranslate"><span class="pre">queue.pl</span></code>), SLURM (<code class="docutils literal notranslate"><span class="pre">slurm.pl</span></code>), etc.</p>
<div class="section" id="demonstration-using-pretrained-models">
<h1>Demonstration using pretrained models<a class="headerlink" href="#demonstration-using-pretrained-models" title="Permalink to this headline">¶</a></h1>
<p>ESPnet provides some pretrained models.
You can easily perform speech recognition using them through a demo script. For example,</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/tedlium/asr1
../../../utils/recog_wav.sh --models tedlium.demo foo.wav
</pre></div>
<p>where <code class="docutils literal notranslate"><span class="pre">foo.wav</span></code> is a WAV file that contains speech to be recognized and <code class="docutils literal notranslate"><span class="pre">tedlium.demo</span></code> is a model name.</p>
<p>You can also perform speech synthesis as follows:</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/libritts/tts1
../../../utils/synth_wav.sh --models libritts.v1 --input_wav bar.wav foo.txt
</pre></div>
<p>where <code class="docutils literal notranslate"><span class="pre">foo.txt</span></code> is a TXT file that contains text to be synthesized and <code class="docutils literal notranslate"><span class="pre">bar.wav</span></code> is a WAV file for controlling meta information such as speaker characteristics of synthesized speech.
The name list of available pretrained models will be summarized.</p>
<div class="section" id="installation-using-docker">
<h1>Installation using Docker<a class="headerlink" href="#installation-using-docker" title="Permalink to this headline">¶</a></h1>
<p>For GPU support nvidia-docker should be installed.</p>
<p>For Execution use the command</p>
<div class="highlight-sh notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/voxforge/asr1
$ ./run_in_docker.sh --gpu GPUID
</pre></div>
<p>If GPUID is set to -1, the program will run only CPU.</p>
<p>The file builds and loads the information into the Docker container. If any additional application is required, modify the Docker devel-file located at the tools folder.</p>
<p>To downgrade or use a private devel file, modify the name inside run_in_docker.sh</p>
<div class="section" id="references">
<h1>References<a class="headerlink" href="#references" title="Permalink to this headline">¶</a></h1>
<p>Please cite the following articles.</p>
<ol class="simple">
<li>Suyoun Kim, Takaaki Hori, and Shinji Watanabe, “Joint CTC-attention based end-to-end speech recognition using multi-task learning,” <em>Proc. ICASSP’17</em>, pp. 4835–4839 (2017)</li>
<li>Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey and Tomoki Hayashi, “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition,” <em>IEEE Journal of Selected Topics in Signal Processing</em>, vol. 11, no. 8, pp. 1240-1253, Dec. 2017</li>
           </div>
          </div>
          <footer>
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
        <a href="notebook/asr_cli.html" class="btn btn-neutral float-right" title="Speech Recognition (Recipe)" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
        <a href="index.html" class="btn btn-neutral float-left" title="ESPnet: end-to-end speech processing toolkit" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
  <div role="contentinfo">
        &copy; Copyright 2017, Shinji Watanabe
  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
        </div>
      </div>
    </section>
  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
  </script>
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

tutorial.html

Latest commit

History

tutorial.html

File metadata and controls