dataset

These shell scripts generate the WSJ0 and THCHS-30 datasets used in
Jiaming Xu, Jing Shi, Guangcan Liu, Xiuyi Chen, Bo Xu.
"Modeling attention and memory for auditory selection in a cocktail party environment"
AAAI, 2018.

WSJ0: https://catalog.ldc.upenn.edu/LDC93S6A

Garofalo, J.; Graff, D.; Paul, D.; and Pallett, D. 2007. Csr-i (wsj0) complete. Linguistic Data Consortium, Philadelphia.

THCHS-30: https://github.com/kaldi-asr/kaldi/tree/master/egs/thchs30

Wang, D., and Zhang, X. 2015. Thchs-30: A free chinese speech corpus. arXiv preprint arXiv:1512.01882.

WSJ0 should be converted into wav files by sph2pipe tool as follows:

sph2pipe -f wav filename.wv1 filename.wav

If you have installed Kaldi, you can directly find the tool at the path: $KALDI_ROOT/tools/sph2pipe_v2.5/sph2pipe, if not, you can direct download from the website: http://sourceforge.net/projects/kaldi/files/sph2pipe_v2.5.tar.gz

The GCC command for sph2pipe in Linux is that:

gcc -o sph2pipe *.c -lm

Name		Name	Last commit message	Last commit date
parent directory ..
THCHS-30		THCHS-30
WSJ0		WSJ0
BGD_150203_010_STR.CH1.wav		BGD_150203_010_STR.CH1.wav
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

FilesExpand file tree

dataset

Directory actions

More options

Directory actions

More options

Latest commit

History

dataset

Folders and files

parent directory

README.md