See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2
| Directory name | Corpus name | Task | Language | URL | Note |
|---|---|---|---|---|---|
| aishell | AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus | ASR | ZH | http://www.aishelltech.com/kysjcp | |
| ami | The AMI Meeting Corpus | ASR | EN | http://groups.inf.ed.ac.uk/ami/corpus/ | |
| an4 | CMU AN4 database | ASR/TTS | EN | http://www.speech.cs.cmu.edu/databases/an4/ | |
| babel | IARPA Babel corups | ASR | ~20 languages | https://www.iarpa.gov/index.php/research-programs/babel | |
| chime4 | The 4th CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | EN | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/ | |
| cmu_indic | CMU INDIC | TTS | 7 languages | http://festvox.org/cmu_indic/ | |
| commonvoice | The Mozilla Common Voice | ASR | 13 languages | https://voice.mozilla.org/datasets | |
| csj | Corpus of Spontaneous Japanese | ASR | JP | https://pj.ninjal.ac.jp/corpus_center/csj/en/ | |
| csmsc | Chinese Standard Mandarin Speech Copus | TTS | ZH | https://www.data-baker.com/open_source.html | |
| css10 | CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages | TTS | 10 langauges | https://github.com/Kyubyong/css10 | |
| dirha_wsj | Distant-speech Interaction for Robust Home Applications | Multichannel ASR | EN | https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj | |
| dns_ins20 | Deep Noise Suppression Challenge – INTERSPEECH 2020 | SE | 7 languages + singing | https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2020/ | |
| fsc | Fluent Speech Commands Dataset | SLU | EN | https://fluent.ai/fluent-speech-commands-a-dataset-for-spoken-language-understanding-research/ | |
| gigaspeech | GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio | ASR | EN | https://github.com/SpeechColab/GigaSpeech | |
| hkust | HKUST/MTS: A very large scale Mandarin telephone speech corpus | ASR | ZH | https://catalog.ldc.upenn.edu/LDC2005S15 | |
| hui_acg | HUI-audio-corpus-german | TTS | DE | https://opendata.iisys.de/datasets.html#hui-audio-corpus-german | |
| how2 | How2: A Large-scale Dataset for Multimodal Language Understanding | ASR/MT/ST | EN->PT | https://github.com/srvk/how2-dataset | |
| iwslt21_low_resource | ALFFA, IARPA Babel, Gamayun, IWSLT 2021 | ASR | SW | http://www.openslr.org/25/ https://catalog.ldc.upenn.edu/LDC2017S05 https://gamayun.translatorswb.org/data/ https://iwslt.org/2021/low-resource | |
| jkac | J-KAC: Japanese Kamishibai and audiobook corpus | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus | |
| jmd | JMD: Japanese multi-dialect corpus for speech synthesis | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus | |
| jsss | JSSS: Japanese speech corpus for summarization and simplification | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus | |
| jsut | Japanese speech corpus of Saruwatari-lab., University of Tokyo | ASR/TTS | JP | https://sites.google.com/site/shinnosuketakamichi/publication/jsut | |
| jtubespeech | Japanese YouTube Speech corpus | ASR/TTS | JP | ||
| jv_openslr35 | Javanese | ASR | JV | http://www.openslr.org/35 | |
| jvs | JVS (Japanese versatile speech) corpus | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus | |
| ksponspeech | KsponSpeech (Korean spontaneous speech) corpus | ASR | KR | https://aihub.or.kr/aidata/105 | |
| kss | Korean single speaker corpus | TTS | KO | https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset | |
| laborotv | LaboroTVSpeech (A large-scale Japanese speech corpus on TV recordings) | ASR | JP | https://laboro.ai/column/eg-laboro-tv-corpus-jp | |
| librimix | LibriMix: An Open-Source Dataset for Generalizable Speech Separation | SE | EN | https://github.com/JorisCos/LibriMix | |
| librispeech | LibriSpeech ASR corpus | ASR | EN | http://www.openslr.org/12 | |
| libritts | LibriTTS corpus | TTS | EN | http://www.openslr.org/60 | |
| ljspeech | The LJ Speech Dataset | TTS | EN | https://keithito.com/LJ-Speech-Dataset/ | |
| lrs2 | The Oxford-BBC Lip Reading Sentences 2 (LRS2) Dataset | Lipreading/ASR | EN | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html | |
| mini_an4 | Mini version of CMU AN4 database for the integration test | ASR/TTS/SE | EN | http://www.speech.cs.cmu.edu/databases/an4/ | |
| mini_librispeech | Mini version of Librispeech corpus | DIAR | EN | https://openslr.org/31/ | |
| mls | MLS (A large multilingual corpus derived from LibriVox audiobooks) | ASR | 8 languages | http://www.openslr.org/94/ | |
| nsc | National Speech Corpus | ASR | EN-SG | https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus | |
| open_li52 | Corpus combination with 52 languages(Commonvocie + voxforge) | Multilingual ASR | 52 languages | ||
| polyphone_swiss_french | Swiss French Polyphone corpus | ASR | FR | http://catalog.elra.info/en-us/repository/browse/ELRA-S0030_02 | |
| puebla_nahuatl | Highland Puebla Nahuatl corpus | ASR | HPN | https://www.openslr.org/92/ | |
| reverb | REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge | ASR | EN | https://reverb2014.dereverberation.com/ | |
| ru_open_stt | Russian Open Speech To Text (STT/ASR) Dataset | ASR | RU | https://github.com/snakers4/open_stt | |
| ruslan | RUSLAN: Russian Spoken Language Corpus For Speech Synthesis | TTS | RU | https://ruslan-corpus.github.io/ | |
| snips | SNIPS: A dataset for spoken language understanding | SLU | EN | https://github.com/sonos/spoken-language-understanding-research-datasets | |
| siwis | SIWIS: Spoken Interaction with Interpretation in Switzerland | TTS | FR | https://https://datashare.ed.ac.uk/handle/10283/2353 | |
| sms_wsj | SMS-WSJ: A database for in-depth analysis of multi-channel source separation algorithms | SE | EN | https://github.com/fgnt/sms_wsj | |
| spgispeech | SPGISpeech 5k corpus | ASR | EN | https://datasets.kensho.com/datasets/scribe | |
| su_openslr36 | Sundanese | ASR | SU | http://www.openslr.org/36 | |
| swbd | Switchboard Corpus for 2-channel Conversational Telephone Speech (300h) | ASR | EN | https://catalog.ldc.upenn.edu/LDC97S62 | |
| swbd_da | NXT Switchboard Annotations | SLU | EN | https://catalog.ldc.upenn.edu/LDC2009T26 | |
| timit | TIMIT Acoustic-Phonetic Continuous Speech Corpus | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S1 | |
| tsukuyomi | つくよみちゃんコーパス | tTS | JP | https://tyc.rei-yumesaki.net/material/corpus | |
| vctk | English Multi-speaker Corpus for CSTR Voice Cloning Toolkit | TTS | EN | http://www.udialogue.org/download/cstr-vctk-corpus.html | |
| vctk_noisyreverb | Noisy reverberant speech database (48kHz) | SE | EN | https://datashare.ed.ac.uk/handle/10283/2826 | |
| vivos | VIVOS (Vietnamese corpus for ASR) | ASR | VI | https://ailab.hcmus.edu.vn/vivos/ | |
| voxforge | VoxForge | ASR | 7 languages | http://www.voxforge.org/ | |
| wham | The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset | SE | EN | https://wham.whisper.ai/ | |
| whamr | WHAMR!: Noisy and Reverberant Single-Channel Speech Separation | SE | EN | https://wham.whisper.ai/ | |
| wsj | CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A | |
| wsj0_2mix | MERL WSJ0-mix multi-speaker dataset | ASR/SE | EN | http://www.merl.com/demos/deep-clustering | |
| wsj0_2mix_spatialized | MERL WSJ0-mix multi-speaker dataset (Spatialized version) | ASR/Multichannel ASR/SE | EN | http://www.merl.com/demos/deep-clustering | |
| yesno | The "yesno" corpus | ASR | HE | http://www.openslr.org/1 | |
| zeroth_korean | Zeroth-Korean | ASR | KR | http://www.openslr.org/40 |