Forked from [SSL-SI-tool] (https://github.com/Yashish92/SSL-SI-tool)
- Original Author: Yashish Maduwantha
Main Updates in this repo:
- add plot

- add gaussian models and simple feedback

- plot and gaussian models only available currently for xrmb not hprc.
- current in experiment phase, gaussian model feedback sensitivity is high, for your experiment please adjust to your desired threshold.
- This update is inspired from paper Subtyping Speech Errors in Childhood Speech Sound Disorders with Acoustic-to-Articulatory Speech Inversion
The SSL-SI-tool implements the pipeline which can be directly used to estimate the articulatory features (6 TVs or 9 TVs + source features) given the speech utterance (.wav files).
This repository holds two Acoustic-to-Articulatory Speech Inversion (SI) systems trained on the Wisconsin XRMB dataset and the HPRC dataset respectively. The model architecture and training are based on the papers Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables, Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion and "Acoustic-to-articulatory Speech Inversion with Multi-task Learning". The pretrained SI systems in this repository have been trained with self-supervised based features (HuBERT and wavLM) as acoustic inputs compared to the 13 MFCCs used in the papers above. Check the two papers above to refer to more information on the types of TVs estimated by each model.
- Model trained on XRMB dataset : Estimates 6 TVs
- Model trained on HPRC dataset : Trained with a MTL framework and estimates 9 TVs + Source features (Aperiodicity, Periodicity and Pitch)
Follow steps in run_instructions.txt to get started quickly !!
The SI systems were trained in a conda environment with Python 3.8.13 and tensorflow==2.10.0. The HuBERT pretrained models used to extract acoustic features have been trained in PyTorch.
- Installation method 1:
First install tensorflow and we recommend doing that in Conda following the steps here.
We also use a number of off the shelf libraries which are listed in requirements.txt. Follow the steps below to install them.
$ pip install speechbrain
$ pip install librosa
$ pip install transformers- Installation method 2 : Installing inidividual libraries from the requirements.txt file.
$ pip install -r requirements.txtWe recommed following method 1 since it will automatically take care of compatible libraries incase there have been new realase versions of respective libraries.
Note : If you run the SI system on GPUs to extract TVs (recommended for lareger datasets), make sure the cuDNN versions for pyTorch (installed by speechbrain) and the one installed with Tensorflow are compatible.
Execute run_SSL_SI_pipeline_custom.py script to run the SI pipeline which performs the following 'steps':
- Run feature_extract.py script to do audio segmentation and extract specified SSL features using the speechbrain library
- Load the pre-trained SSL-SI model and evaluate on the extracted SSL feature data generated in step 1
- Save the predicted Tract Variables (TVs)
Execute 1_folder_restructur.py script to restruct files to 1 folder
Execute 2_data_collection.py script to extract feature datas from .npy files.
Execute 3_plot_multipscript to generate plot
Execute 4_gaussian.pyscript to generate gaussian models and get gaussian plot and simple feedback.
2-4 has additional script with _ignore_context, which give you the possibility to ignore context info. But use that only when you are sure you want to ignore coarticulations.
The tract variables can be saved as either numpy files or mat files for convenience. The TVs and source features are saved in the following order in the output files.
- 6TVs with XRMB : LA, LP, TBCL, TBCD, TTCL, TTCD
- 12 TVs with HPRC : LA, LP, TBCL, TBCD, TTCL, TTCD, JA, TMCL, TMCD, Periodicity, Aperiodicity, Pitch (normalized to 0 to 1 range)
usage: run_SSL_SI_pipeline.py [-h] [-m MODEL] [-f FEATS] [-i PATH]
[-o OUT_FORMAT]
Run the SI pipeline
optional arguments:
-h, --help show this help message and exit
-m MODEL, --model MODEL
set which SI system to run, xrmb trained (xrmb) or
hprc trained (hprc)
-f FEATS, --feats FEATS
set which SSL pretrained model to be used to extract
features, hubert to use HuBERT-large and wavlm to use
wavLM-large pretrained models
-i PATH, --path PATH path to directory with audio files
-o OUT_FORMAT, --out_format OUT_FORMAT
output TV file format (mat or npy)
- Run the pipeline from end to end (executes all 3 steps) to extract features
python 1_run_SSL_SI_pipeline_custom.py -m xrmb -f hubert -c path/to/csv/file -o 'npy'- Put all subfolder to one folder, update folder location in script line 73 and reconstruct the folder
python 1_folder_restructur.py - collect feature data
python 2_data_collection.py /path/to/npy/folder /path/to/testclean/folder ɪ_ʃ_t ɪ_s_t i_t͡ʃ_t- plot
python 3_plot_multip.py path/to/csv/from/2nd/step <phoneme_combinations>- gaussian
python 4a_gaussian.py --train-csv xx.csv --eval-csv xxx.csv --phoneme-combinations xxx --threshold optional --load_model xxoptional you can adjust threshold to control the sensitiveness, you can also use load_model to use existing model. if no eval csv then the script will just generate gaussian models.
- ignore context example: ʃ s t͡ʃ z θ ð ʒ ; with context example: ɪ_ʃ_t ɪ_s_t i_t͡ʃ_t
This project is licensed under the LICENSE-CC-BY-NC-ND-4.0 - see the LICENSE file for details