See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2
| Directory name | Corpus name | Task | Language | URL | Note |
|---|---|---|---|---|---|
| accentdb | A Database of Non-Native English Accents | Accent Recognition | ENG | https://accentdb.org/ | |
| accented_french_openslr57 | African Accented French Corpus | ASR | FRA | https://www.openslr.org/57/ | |
| acesinger | ACESinger Singing Corpus | SVS | CMN | WIP | |
| aesrc2020 | Accented English Speech Recognition Challenge 2020 | ASR | ENG | https://arxiv.org/abs/2102.10233 | |
| aidatatang_200zh | Aidatatang_200zh A free Chinese Mandarin speech corpus | ASR | CMN | http://www.openslr.org/resources/62 | |
| aishell | AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus | ASR | CMN | http://www.aishelltech.com/kysjcp | |
| aishell2 | AISHELL-2 Open Source Mandarin Speech Corpus | ASR | CMN | https://www.aishelltech.com/aishell_2 | |
| aishell3 | AISHELL3 Mandarin multi-speaker text-to-speech | TTS | CMN | https://www.openslr.org/93/ | |
| aishell4 | AISHELL4 Open Source Mandarin Speech Corpus in Conference Scenario | ASR/SE | CMN | https://www.openslr.org/111/ | |
| ameboshi | Ameboshi Ciphyer's singing voice database | SVS | JPN | https://parapluie2c56m.wixsite.com/mysite | |
| americasnlp22 | The Second AmericasNLP Competition | ASR | BZD, GUG, GVC, QWE, TAV | http://turing.iimas.unam.mx/americasnlp/st.html | |
| ami | The AMI Meeting Corpus | ASR | ENG | http://groups.inf.ed.ac.uk/ami/corpus/ | |
| an4 | CMU AN4 database | ASR/TTS | ENG | http://www.speech.cs.cmu.edu/databases/an4/ | |
| aphasiabank | AphasiaBank database (English) | ASR | ENG | https://aphasia.talkbank.org/ | |
| arabic_sc | Database for Arabic Speech Commands Recognition | SLU | ARA | https://github.com/ltkbenamer/AR_Speech_Database | |
| asvspoof | The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database | Fak Speech Detection | ENG | https://datashare.ed.ac.uk/handle/10283/3336 | |
| babel | IARPA Babel corups | ASR | ~20 languages | https://www.iarpa.gov/index.php/research-programs/babel | |
| bibletts | Bible TTS corups | TTS | 6 Sub-Saharan Africa languages | https://masakhane-io.github.io/bibleTTS/ | |
| bn_openslr53 | Large bengali ASR training dataset | ASR | BEN | https://openslr.org/53/ | |
| bur_openslr80 | Burmese ASR training dataset | ASR | BUR | https://openslr.org/80/ | |
| catslu | CATSLU-MAPS | SLU | CMN | https://sites.google.com/view/catslu/home | |
| catslu_entity | CATSLU | SLU/Entity Classifi. | CMN | https://sites.google.com/view/catslu/home | |
| chime1 | The 1st CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | ENG | https://spandh.dcs.shef.ac.uk/chime_challenge/chime2011/ | |
| chime2 | The 2nd CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | ENG | https://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/ | |
| chime4 | The 4th CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | ENG | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/ | |
| chime6 | The 6th CHiME Speech Separation and Recognition Challenge | ASR | ENG | https://chimechallenge.github.io/chime6/ | |
| clarity21 | The First Clarity Enhancement Challenge CEC1 | SE | ENG | https://claritychallenge.github.io/clarity_CEC1_doc/ | |
| cmu_arctic | CMU ARCTIC | TTS | ENG | http://www.festvox.org/cmu_arctic/ | |
| cmu_indic | CMU INDIC | TTS | 7 languages | http://festvox.org/cmu_indic/ | |
| commonvoice | The Mozilla Common Voice | ASR | 13 languages | https://voice.mozilla.org/datasets | |
| conferencingspeech21 | Far-field Multi-channel Speech Enhancement Challenge for Video Conferencing (ConferencingSpeech 2021) | SE | ENG, CMN | https://tea-lab.qq.com/conferencingspeech-2021 | |
| covost2 | Multilingual speech-to-text translation corpus from Common Voice | ST | lang pairs from 22 | https://github.com/facebookresearch/covost | |
| csj | Corpus of Spontaneous Japanese | ASR | JPN | https://pj.ninjal.ac.jp/corpus_center/csj/en/ | |
| csmsc | Chinese Standard Mandarin Speech Copus | TTS | CMN | https://www.data-baker.com/open_source.html | |
| css10 | CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages | TTS | 10 langauges | https://github.com/Kyubyong/css10 | |
| dcase22_task1 | DCASE Task1 2022 Dataset | SLU | ENG | https://dcase.community/challenge2022/task-low-complexity-acoustic-scene-classification | |
| dirha_wsj | Distant-speech Interaction for Robust Home Applications | Multichannel ASR | ENG | https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj | |
| dns_ins20 | Deep Noise Suppression Challenge – INTERSPEECH 2020 | SE | 7 languages +singing | https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2020/ | |
| dns_icassp21 | Deep Noise Suppression Challenge – ICASSP 2021 | SE | 11 languages + singing | https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2021/ | |
| dns_icassp22 | Deep Noise Suppression Challenge – ICASSP 2022 | SE | 11 languages + singing | https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2022/ | |
| dns_ins20 | Deep Noise Suppression Challenge – INTERSPEECH 2020 | SE | 11 languages + singing | https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/ | |
| dns_ins21 | Deep Noise Suppression Challenge – INTERSPEECH 2021 | SE | 11 languages + singing | https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/ | |
| dsing | Automatic Lyric Transcription from Karaoke Vocal Tracks (From DAMP Sing300x30x2) | ASR (ALT) | ENG singing | https://github.com/groadabike/Kaldi-Dsing-task | |
| easycom | An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Classification | ASR | ENG | https://github.com/facebookresearch/EasyComDataset | |
| esc50 | Dataset for Environmental Sound Classification | Audio Classification | https://github.com/karolpiczak/ESC-50 | ||
| fisher_callhome_spanish | Fisher and CALLHOME Spanish--English Speech Translation | ASR/ST | SPA->ENG | https://catalog.ldc.upenn.edu/LDC2014T23 | |
| fleurs | Few-shot Learning Evaluation of Universal Representations of Speech | ASR/Multilingual | 102 languages | https://huggingface.co/datasets/google/fleurs | |
| freesound | Speech Command & Freesound for VAD | English | https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speech_classification/datasets.html#speech-command-freesound-for-vad | ||
| fsc | Fluent Speech Commands Dataset | SLU | ENG | https://fluent.ai/fluent-speech-commands-a-dataset-for-spoken-language-understanding-research/ | |
| fsc_challenge | Fluent Speech Commands Dataset MASE Eval Challenge splits | SLU | ENG | https://github.com/maseEval/mase | |
| fsc_unseen | Fluent Speech Commands Dataset MASE Eval Unseen splits | SLU | ENG | https://github.com/maseEval/mase | |
| gigaspeech | GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio | ASR | ENG | https://github.com/SpeechColab/GigaSpeech | |
| googlei18n_lowresource | Googlei18n crowdsource project | TTS | ENG | https://github.com/mirumee/google-i18n-address (most in openslr as separate entries) | |
| grabo | Grabo dataset | SLU | ENG + NLD | https://www.esat.kuleuven.be/psi/spraak/downloads/ | |
| gramvaani | GramVaani ASR Challenge 2022 | ASR | HI | https://sites.google.com/view/gramvaaniasrchallenge/dataset | |
| harpervalley | HarperValleyBank: A Domain-Specific Spoken Dialog Corpus | SLU | ENG | https://github.com/cricketclub/gridspace-stanford-harper-valley | |
| hkust | HKUST/MTS: A very large scale Mandarin telephone speech corpus | ASR | CMN | https://catalog.ldc.upenn.edu/LDC2005S15 | |
| how2 | How2: A Large-scale Dataset for Multimodal Language Understanding | ASR/MT/ST | ENG->POR | https://github.com/srvk/how2-dataset | |
| how2_2000h | How2_2000h fbank features | ASR/SUM | ENG->POR | https://arxiv.org/pdf/2110.06263.pdf | |
| hub4_spanish | 1997 Spanish Broadcase News Speech | ASR | SPA | https://catalog.ldc.upenn.edu/LDC98S74 | |
| hui_acg | HUI-audio-corpus-german | TTS | DEU | https://opendata.iisys.de/datasets.html#hui-audio-corpus-german | |
| iam | IAM Handwriting Database 3.0 | OCR | ENG | https://fki.tic.heia-fr.ch/databases/iam-handwriting-database | |
| iemocap | IEMOCAP database: The Interactive Emotional Dyadic Motion Capture database | SLU | ENG | https://sail.usc.edu/iemocap/ | |
| indic_speech | IndicSpeech: Text-to-Speech Corpus for Indian Languages | TTS | 3 indic languages | http://cvit.iiit.ac.in/research/projects/cvit-projects/text-to-speech-dataset-for-indian-languages | |
| interspeech2024_dsu_challenge | Interspeech2024 speech processing using discrete speech unit challenge (ASR track) | ASR/Multilingual ASR | 145 languages | https://www.wavlab.org/activities/2024/Interspeech2024-Discrete-Speech-Unit-Challenge/ | |
| itako | Itako Singing voice synthesis corpus | SVS | JPN | https://zunko.jp/itadev/login.php | |
| iwslt14 | IWSLT14 MT shared task | MT | DEU->ENG | http://dl.fbaipublicfiles.com/fairseq/data/iwslt14/de-en.tgz | |
| iwslt21_low_resource | ALFFA, IARPA Babel, Gamayun, IWSLT 2021 | ASR | SWA | http://www.openslr.org/25/ https://catalog.ldc.upenn.edu/LDC2017S05 https://gamayun.translatorswb.org/data/ https://iwslt.org/2021/low-resource | |
| iwslt22_dialect | IWSLT2022 dialectal speech translation shared task | ASR/ST | ARA->Tunisian ARA | https://github.com/kevinduh/iwslt22-dialect.git | |
| iwslt22_low_resource | IWSLT2022 Low-resource speech translation track task | ST | Tamasheq->FrenchPermalink | https://github.com/mzboito/IWSLT2022_Tamasheq_data.git | |
| jdcinal | Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags | SLU | JPN | http://www.lrec-conf.org/proceedings/lrec2018/pdf/464.pdf http://tts.speech.cs.cmu.edu/awb/infomation_navigation_and_attentive_listening_0.2.zip | |
| jkac | J-KAC: Japanese Kamishibai and audiobook corpus | TTS | JPN | https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus | |
| jmd | JMD: Japanese multi-dialect corpus for speech synthesis | TTS | JPN | https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus | |
| jsss | JSSS: Japanese speech corpus for summarization and simplification | TTS | JPN | https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus | |
| jsut | Japanese speech corpus of Saruwatari-lab., University of Tokyo | ASR/TTS | JPN | https://sites.google.com/site/shinnosuketakamichi/publication/jsut | |
| jsut_song | JSUT-song corpus | SVS | JPN | https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song | |
| jtubespeech | Japanese YouTube Speech corpus | ASR/TTS | JPN | ||
| jv_openslr35 | Javanese | ASR | JAV | http://www.openslr.org/35 | |
| jvs | JVS (Japanese versatile speech) corpus | TTS | JPN | https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus | |
| kathbath | Kathbath dataset | ASR | 12 Indian langauges | https://ai4bharat.iitm.ac.in/indic-superb | |
| kising | KiSing-v2 Corpus (ACESinger augmented) | SVS | CMN | WIP | |
| ksponspeech | KsponSpeech (Korean spontaneous speech) corpus | ASR | KOR | https://aihub.or.kr/aidata/105 | |
| ksc | Kazakh speech corpus | ASR | KAZ | ||
| kss | Korean single speaker corpus | TTS | KOR | https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset | |
| l3das22 | L3DAS22: Machine Learning for 3D Audio Signal Processing - ICASSP 2022 | SE | ENG | https://www.l3das.com/icassp2022/ | |
| laborotv | LaboroTVSpeech (A large-scale Japanese speech corpus on TV recordings) | ASR | JPN | https://laboro.ai/column/eg-laboro-tv-corpus-jp | |
| libriheavy_medium | Libriheavy medium subset | ASR | ENG | https://github.com/k2-fsa/libriheavy | |
| libriheavy_small | Libriheavy small subset | ASR | ENG | https://github.com/k2-fsa/libriheavy | |
| librilight_limited | Librilight-limited subset | ASR | ENG | https://dl.fbaipublicfiles.com/librilight/data/librispeech_finetuning.tgz | |
| librimix | LibriMix: An Open-Source Dataset for Generalizable Speech Separation | SE/DIAR | ENG | https://github.com/JorisCos/LibriMix | |
| librispeech | LibriSpeech ASR corpus | ASR | ENG | http://www.openslr.org/12 | |
| librispeech_100 | LibriSpeech ASR corpus 100h subset | ASR | ENG | http://www.openslr.org/12 | |
| libritts | LibriTTS corpus | TTS | ENG | http://www.openslr.org/60 | |
| libritts_r | LibriTTS-R corpus | TTS | ENG | http://www.openslr.org/141 | |
| ljspeech | The LJ Speech Dataset | TTS | ENG | https://keithito.com/LJ-Speech-Dataset/ | |
| lrs2 | The Oxford-BBC Lip Reading Sentences 2 (LRS2) Dataset | Lipreading/ASR | ENG | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html | |
| lrs3 | The Oxford-BBC Lip Reading Sentences 3 (LRS3) Dataset | ASR | ENG | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs3.html | |
| lt_slurp_spatialized | Spatialized Libri-Trans and Spatialized SLURP (LT-S and SLURP-S), Enhancement for Translation and Understanding Dataset | SE/ST/SLU | ENG | ||
| lt_speech_commands | Lithuanian Speech Commands dataset | LIT | https://github.com/kolesov93/lt_speech_commands | ||
| m4singer | Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus | SVS | CMN | https://drive.google.com/file/d/1xC37E59EWRRFFLdG3aJkVqwtLDgtFNqW/view?usp=share_link | |
| magicdata | MAGICDATA Mandarin Chinese Read Speech Corpus | ASR | ENG | https://www.openslr.org/68/ | |
| media | MEDIA speech database for French | SLU/Entity Classifi. | FRA | https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/ | |
| mediaspeech | MediaSpeech: Multilanguage ASR Benchmark and Dataset | ASR | FRA | https://www.openslr.org/108/ | |
| meld | MELD: Multimodal EmotionLines Dataset | SLU | ENG | https://affective-meld.github.io/ | |
| microsoft_speech | Microsoft Speech Corpus (Indian languages) | ASR | 3 languages | https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e | |
| mini_an4 | Mini version of CMU AN4 database for the integration test | ASR/TTS/SE | ENG | http://www.speech.cs.cmu.edu/databases/an4/ | |
| mini_librispeech | Mini version of Librispeech corpus | DIAR | ENG | https://openslr.org/31/ | |
| misp2021 | Multimodal Information Based Speech Processing (MISP) Challenge 2021 | ASR/AVSR | MAL | https://mispchallenge.github.io/ | |
| ml_openslr63 | Crowdsourced high-quality Malayalam multi-speaker speech data | ASR | MAL | https://openslr.org/63/ | |
| mls | MLS (A large multilingual corpus derived from LibriVox audiobooks) | ASR | 8 languages | http://www.openslr.org/94/ | |
| mr_openslr64 | OpenSLR Marathi Corpus | ASR | MAR | http://www.openslr.org/64/ | |
| ms_indic_is18 | Microsoft Speech Corpus (Indian languages) | ASR | 3 langs: TEL TAM GUJ | https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e | |
| ml_superb | Multilingual SUPERB benchamrk | ASR | 145 languages | Not Released | |
| mucs21_subtask1 | MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages | ASR | 6 indian languages | https://navana-tech.github.io/MUCS2021/challenge_details.html | |
| mucs21_subtask2 | MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages | ASR | 2 codeswitching data | https://navana-tech.github.io/MUCS2021/challenge_details.html | |
| musdb18 | Music source separation corpus | ENH | ENG | https://sigsep.github.io/datasets/musdb.htmlmust-c/ | |
| must_c | https://ict.fbk.eu/must-c/ | ASR/MT/ST | ENG->14langs | https://ict.fbk.eu/must-c/ | |
| must_c_v2 | https://ict.fbk.eu/must-c/ | ASR/MT/ST | ENG->DEU | https://ict.fbk.eu/must-c/ | |
| mustard | MUStARD: Multimodal Sarcasm Detection Dataset | SLU | ENG | https://github.com/soujanyaporia/MUStARD/ | |
| mustard_plus_plus | A Multimodal Corpus for Emotion Recognition in Sarcasm | SLU | ENG | https://github.com/cfiltnlp/MUStARD_Plus_Plus/ | |
| nit_song070 | The NITech Japanese speech database | SVS | JPN | http://hts.sp.nitech.ac.jp/archives/2.3/HTS-demo_NIT-SONG070-F001.tar.bz2 | |
| nsc | National Speech Corpus | ASR | ENG-SG | https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus | |
| ofuton_p_utagoe_db | Ofuton_p_utagoe Singing voice synthesis corpus | SVS | JPN | https://sites.google.com/view/oftn-utagoedb/%E3%83%9B%E3%83%BC%E3%83%A0 | |
| oniku_kurumi_utagoe_db | Oniku Singing voice synthesis corpus | SVS | JPN | http://onikuru.info/db-download/ | |
| open_li110 | Corpus combination with 110 languages | Multilingual ASR | 100+ languages | ||
| open_li52 | Corpus combination with 52 languages(Commonvocie + voxforge) | Multilingual ASR | 52 languages | ||
| opencpop | Opencpop: Mandarin singing voice synthesis corpus | SVS | CMN | https://wenet.org.cn/opencpop/ | |
| pjs | Phoneme-balanced Japanese Singing-voice corpus | SVS | JPN | https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus | |
| polyphone_swiss_french | Swiss French Polyphone corpus | ASR | FRA | http://catalog.elra.info/en-us/repository/browse/ELRA-S0030_02 | |
| portmedia_dom | PortMedia French corpus | SLU/Entity Classifi. | FRA | https://catalogue.elra.info/en-us/repository/browse/ELRA-S0371/ | |
| portmedia_lang | PortMedia Italian corpus | SLU/Entity Classifi. | ITA | https://catalogue.elra.info/en-us/repository/browse/ELRA-S0371/ | |
| primewords_chinese | Primewords Chinese Corpus Set 1 | ASR | CMN | https://www.openslr.org/47/ | |
| puebla_nahuatl | Highland Puebla Nahuatl corpus (endangered language in central Mexico) | ASR/ST | HPN | https://www.openslr.org/92/ | |
| qasr_tts | TTS character based system using semi-supervised data selection | TTS | ARA | https://arabicspeech.org/qasr_tts | |
| reasonspeech | ReazonSpeech: Japanese Corpus collected from TV Programs | ASR | JPN | https://research.reazon.jp/projects/ReazonSpeech/ | |
| reverb | REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge | ASR | ENG | https://reverb2014.dereverberation.com/ | |
| ru_open_stt | Russian Open Speech To Text (STT/ASR) Dataset | ASR | RUS | https://github.com/snakers4/open_stt | |
| ruslan | RUSLAN: Russian Spoken Language Corpus For Speech Synthesis | TTS | RUS | https://ruslan-corpus.github.io/ | |
| sdsv21 | SdSV 2021: Short-duration Speaker Verification (SdSV) Challenge 2021 | SPK | 10+ Languages | https://sdsvc.github.io/ | |
| seame | SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia | ASR | ENG + CMN | https://catalog.ldc.upenn.edu/LDC2015S04 | |
| sinhala | Sinhala speech recognition corpus | ASR | SIN | https://drive.google.com/file/d/17_e0JhMW4_FPxfh93foplnxb4OQp8zh3/view?usp=sharing | |
| siwis | SIWIS: Spoken Interaction with Interpretation in Switzerland | TTS | FRA | https://datashare.ed.ac.uk/handle/10283/2353 | |
| slue-voxceleb | SLUE: Spoken Language Understanding Evaluation | SLU | ENG | https://github.com/asappresearch/slue-toolkit | |
| slue-voxpopuli | SLUE: Spoken Language Understanding Evaluation | SLU | ENG | https://github.com/asappresearch/slue-toolkit | |
| slurp | SLURP: A Spoken Language Understanding Resource Package | SLU | ENG | https://github.com/pswietojanski/slurp | |
| slurp_entity | SLURP: A Spoken Language Understanding Resource Package | SLU/Entity Classifi. | ENG | https://github.com/pswietojanski/slurp | |
| slurp_spatialized | Spatialized SLURP (SLURP-S), Noisy Reverberan Spoken Language Understanding Dataset | SLU | ENG | ||
| sms_wsj | SMS-WSJ: A database for in-depth analysis of multi-channel source separation algorithms | SE | ENG | https://github.com/fgnt/sms_wsj | |
| snips | SNIPS: A dataset for spoken language understanding | SLU | ENG | https://github.com/sonos/spoken-language-understanding-research-datasets | |
| speechcommands | Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition | SLU | ENG | https://www.tensorflow.org/datasets/catalog/speech_commands | |
| spgispeech | SPGISpeech 5k corpus | ASR | ENG | https://datasets.kensho.com/datasets/scribe | |
| spring_speech | SPRING-INX: Data for Indian Languages | ASR | ENG | https://asr.iitm.ac.in/dataset | |
| stop | STOP: Spoken Task Oriented Parsing | SLU | ENG | https://facebookresearch.github.io/spoken_task_oriented_parsing/ | |
| su_openslr36 | Sundanese | ASR | SUN | http://www.openslr.org/36 | |
| swbd | Switchboard Corpus for 2-channel Conversational Telephone Speech (300h) | ASR | ENG | https://catalog.ldc.upenn.edu/LDC97S62 | |
| swbd_da | NXT Switchboard Annotations | SLU | ENG | https://catalog.ldc.upenn.edu/LDC2009T26 | |
| swbd_sentiment | Speech Sentiment Annotations | SLU | ENG | https://catalog.ldc.upenn.edu/LDC2020T14 | |
| talromur | Talromur: A large Icelandic TTS corpus | TTS | ISL | https://repository.clarin.is/repository/xmlui/handle/20.500.12537/104, https://aclanthology.org/2021.nodalida-main.50.pdf | |
| talromur2 | Talromur 2: Icelandic multi-speaker TTS corpus | TTS | ISL | https://repository.clarin.is/repository/xmlui/handle/20.500.12537/167 | |
| tedlium2 | TED-LIUM corpus release 2 | ASR | ENG | https://www.openslr.org/19/, http://www.lrec-conf.org/proceedings/lrec2014/pdf/1104_Paper.pdf | |
| tedlium3 | TED-LIUM corpus release 3 | ASR | ENG | https://www.openslr.org/51/ | |
| tedx_spanish_openslr67 | TEDx Spanish Corpus | ASR | SPA | https://www.openslr.org/67/ | |
| thchs30 | A Free Chinese Speech Corpus Released by CSLT@Tsinghua University | ASR/TTS | CMN | https://www.openslr.org/18/ | |
| timit | TIMIT Acoustic-Phonetic Continuous Speech Corpus | ASR/UASR | ENG | https://catalog.ldc.upenn.edu/LDC93S1 | |
| totonac | Highland Totonac corpus (endangered language in central Mexico) | ASR | TOS | http://www.openslr.org/107/ | |
| tsukuyomi | つくよみちゃんコーパス | TTS | JPN | https://tyc.rei-yumesaki.net/material/corpus | |
| universal_se_v1 | Combination of Multi-condition English Corpora (vctk_noisy, dns_ins20, chime4, reverb, whamr) | SE | ENG | ||
| vctk | English Multi-speaker Corpus for CSTR Voice Cloning Toolkit | ASR/TTS | ENG | http://www.udialogue.org/download/cstr-vctk-corpus.html | |
| vctk_reverb | Reverberant speech database (48kHz) | SE | ENG | https://datashare.ed.ac.uk/handle/10283/2826 | |
| vctk_noisyreverb | Noisy reverberant speech database (48kHz) | SE | ENG | https://datashare.ed.ac.uk/handle/10283/2826 | |
| vivos | VIVOS (Vietnamese corpus for ASR) | ASR | VIE | https://doi.org/10.5281/zenodo.7068130 | |
| voices | VOiCES | ASR/SPK | ENG | https://iqtlabs.github.io/voices/ | |
| voxceleb | VoxCeleb | SPK | 10+ languages | https://mm.kaist.ac.kr/datasets/voxceleb/ | |
| voxforge | VoxForge | ASR | 7 languages | http://www.voxforge.org/ | |
| wenetspeech | WenetSpeech: A 10000+ Hours Multi-domain Chinese Corpus for Speech Recognition | ASR | CMN | https://wenet-e2e.github.io/WenetSpeech/ | |
| wham | The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset | SE | ENG | https://wham.whisper.ai/ | |
| whamr | WHAMR!: Noisy and Reverberant Single-Channel Speech Separation | SE | ENG | https://wham.whisper.ai/ | |
| wsj | CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete | ASR | ENG | https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A | |
| wsj0_2mix | MERL WSJ0-mix multi-speaker dataset | ASR/SE | ENG | http://www.merl.com/demos/deep-clustering | |
| wsj0_2mix_spatialized | MERL WSJ0-mix multi-speaker dataset (Spatialized version) | ASR/Multichannel ASR/SE | ENG | http://www.merl.com/demos/deep-clustering | |
| yesno | The "yesno" corpus | ASR | HEB | http://www.openslr.org/1 | |
| yoloxochitl_mixtec | Yoloxochitl-Mixtec corpus (endangered language in central Mexico) | ASR | XTY | http://www.openslr.org/89 | |
| zeroth_korean | Zeroth-Korean | ASR | KOR | http://www.openslr.org/40 | |
| zh_openslr38 | ST-CMDS-20170001_1, Free ST Chinese Mandarin Corpus | ASR | CMN | http://www.openslr.org/38 |