Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

egs2 (Examples of ESPnet2)

How to use?

See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2

Overview of example information

Directory name Corpus name Task Language URL Note
aishell AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus ASR ZH http://www.aishelltech.com/kysjcp
ami The AMI Meeting Corpus ASR EN http://groups.inf.ed.ac.uk/ami/corpus/
an4 CMU AN4 database ASR/TTS EN http://www.speech.cs.cmu.edu/databases/an4/
babel IARPA Babel corups ASR ~20 Languages https://www.iarpa.gov/index.php/research-programs/babel
chime4 The 4th CHiME Speech Separation and Recognition Challenge ASR/Multichannel ASR EN http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/
commonvoice The Mozilla Common Voice ASR 13 Languages https://voice.mozilla.org/datasets
csj Corpus of Spontaneous Japanese ASR JP https://pj.ninjal.ac.jp/corpus_center/csj/en/
csmsc Chinese Standard Mandarin Speech Copus TTS ZH https://www.data-baker.com/open_source.html
dirha_wsj Distant-speech Interaction for Robust Home Applications Multi-Array ASR EN https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj
dns_ins20 Deep Noise Suppression Challenge – INTERSPEECH 2020 SE 7 Languages + singing https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2020/
gigaspeech GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio ASR EN https://github.com/SpeechColab/GigaSpeech
hkust HKUST/MTS: A very large scale Mandarin telephone speech corpus ASR ZH. https://catalog.ldc.upenn.edu/LDC2005S15
how2 How2: A Large-scale Dataset for Multimodal Language Understanding ASR/Machine Translation/Speech Translation EN->PT https://github.com/srvk/how2-dataset
jsss JSSS: Japanese speech corpus for summarization and simplification TTS JP https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus
jsut Japanese speech corpus of Saruwatari-lab., University of Tokyo ASR/TTS JP https://sites.google.com/site/shinnosuketakamichi/publication/jsut
jv_openslr35 Javanese ASR JV http://www.openslr.org/35
jvs JVS (Japanese versatile speech) corpus TTS JP https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus
laborotv LaboroTVSpeech (A large-scale Japanese speech corpus on TV recordings) ASR JP https://laboro.ai/column/eg-laboro-tv-corpus-jp
librimix LibriMix: An Open-Source Dataset for Generalizable Speech Separation SE EN https://github.com/JorisCos/LibriMix
librispeech LibriSpeech ASR corpus ASR EN http://www.openslr.org/12
libritts LibriTTS corpus TTS EN http://www.openslr.org/60
ljspeech The LJ Speech Dataset TTS EN https://keithito.com/LJ-Speech-Dataset/
lrs2 The Oxford-BBC Lip Reading Sentences 2 (LRS2) Dataset Lipreading/ASR EN https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html
mini_an4 Mini version of CMU AN4 database for the integration test ASR/TTS/SE EN http://www.speech.cs.cmu.edu/databases/an4/
mini_librispeech Mini version of Librispeech corpus DIAR EN https://openslr.org/31/
mls MLS (A large multilingual corpus derived from LibriVox audiobooks) ASR 8 languages http://www.openslr.org/94/
nsc National Speech Corpus ASR EN-SG https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus
open_li52 Corpus combination with 52 languages(Commonvocie + voxforge) Multilingual ASR 52 languages
polyphone_swiss_french Swiss French Polyphone corpus ASR FR http://catalog.elra.info/en-us/repository/browse/ELRA-S0030_02
puebla_nahuatl Highland Puebla Nahuatl corpus ASR HPN https://www.openslr.org/92/
reverb REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge ASR EN https://reverb2014.dereverberation.com/
ru_open_stt Russian Open Speech To Text (STT/ASR) Dataset ASR RU https://github.com/snakers4/open_stt
sms_wsj SMS-WSJ: A database for in-depth analysis of multi-channel source separation algorithms SE EN https://github.com/fgnt/sms_wsj.
spgispeech SPGISpeech 5k corpus ASR EN https://datasets.kensho.com/datasets/scribe
su_openslr36 Sundanese ASR SU http://www.openslr.org/36
swbd Switchboard Corpus for 2-channel Conversational Telephone Speech (300h) ASR EN https://catalog.ldc.upenn.edu/LDC97S62
timit TIMIT Acoustic-Phonetic Continuous Speech Corpus ASR EN https://catalog.ldc.upenn.edu/LDC93S1
vctk English Multi-speaker Corpus for CSTR Voice Cloning Toolkit TTS EN http://www.udialogue.org/download/cstr-vctk-corpus.html
vctk_noisyreverb Noisy reverberant speech database (48kHz) SE EN https://datashare.ed.ac.uk/handle/10283/2826
vivos VIVOS (Vietnamese corpus for ASR) ASR VI https://ailab.hcmus.edu.vn/vivos/
voxforge VoxForge ASR 7 languages http://www.voxforge.org/
wham The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset SE EN https://wham.whisper.ai/
whamr WHAMR!: Noisy and Reverberant Single-Channel Speech Separation SE EN. https://wham.whisper.ai/
wsj CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete ASR EN https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A
wsj0_2mix MERL WSJ0-mix multi-speaker dataset ASR/SE EN http://www.merl.com/demos/deep-clustering
wsj0_2mix_spatialized MERL WSJ0-mix multi-speaker dataset (Spatialized version) ASR/Multichannel ASR/SE EN http://www.merl.com/demos/deep-clustering
yesno The "yesno" corpus ASR HE http://www.openslr.org/1
zeroth_korean Zeroth-Korean ASR KR http://www.openslr.org/40