SSL-SI-tool-w_plot_gaussiam

Forked from [SSL-SI-tool] (https://github.com/Yashish92/SSL-SI-tool)

Original Author: Yashish Maduwantha

Main Updates in this repo:

add plot
add gaussian models and simple feedback
plot and gaussian models only available currently for xrmb not hprc.
current in experiment phase, gaussian model feedback sensitivity is high, for your experiment please adjust to your desired threshold.
This update is inspired from paper Subtyping Speech Errors in Childhood Speech Sound Disorders with Acoustic-to-Articulatory Speech Inversion

Description

The SSL-SI-tool implements the pipeline which can be directly used to estimate the articulatory features (6 TVs or 9 TVs + source features) given the speech utterance (.wav files).

This repository holds two Acoustic-to-Articulatory Speech Inversion (SI) systems trained on the Wisconsin XRMB dataset and the HPRC dataset respectively. The model architecture and training are based on the papers Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables, Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion and "Acoustic-to-articulatory Speech Inversion with Multi-task Learning". The pretrained SI systems in this repository have been trained with self-supervised based features (HuBERT and wavLM) as acoustic inputs compared to the 13 MFCCs used in the papers above. Check the two papers above to refer to more information on the types of TVs estimated by each model.

Model trained on XRMB dataset : Estimates 6 TVs
Model trained on HPRC dataset : Trained with a MTL framework and estimates 9 TVs + Source features (Aperiodicity, Periodicity and Pitch)

Installation Guide

Follow steps in run_instructions.txt to get started quickly !!

The SI systems were trained in a conda environment with Python 3.8.13 and tensorflow==2.10.0. The HuBERT pretrained models used to extract acoustic features have been trained in PyTorch.

Installation method 1:

First install tensorflow and we recommend doing that in Conda following the steps here.

We also use a number of off the shelf libraries which are listed in requirements.txt. Follow the steps below to install them.

$ pip install speechbrain
$ pip install librosa
$ pip install transformers

Installation method 2 : Installing inidividual libraries from the requirements.txt file.

$ pip install -r requirements.txt

We recommed following method 1 since it will automatically take care of compatible libraries incase there have been new realase versions of respective libraries.

Note : If you run the SI system on GPUs to extract TVs (recommended for lareger datasets), make sure the cuDNN versions for pyTorch (installed by speechbrain) and the one installed with Tensorflow are compatible.

Run SI tool pipeline

Execute run_SSL_SI_pipeline_custom.py script to run the SI pipeline which performs the following 'steps':

Run feature_extract.py script to do audio segmentation and extract specified SSL features using the speechbrain library
Load the pre-trained SSL-SI model and evaluate on the extracted SSL feature data generated in step 1
Save the predicted Tract Variables (TVs)

Execute 1_folder_restructur.py script to restruct files to 1 folder

Execute 2_data_collection.py script to extract feature datas from .npy files.

Execute 3_plot_multipscript to generate plot

Execute 4_gaussian.pyscript to generate gaussian models and get gaussian plot and simple feedback.

2-4 has additional script with _ignore_context, which give you the possibility to ignore context info. But use that only when you are sure you want to ignore coarticulations.

Tract Variables Output

The tract variables can be saved as either numpy files or mat files for convenience. The TVs and source features are saved in the following order in the output files.

6TVs with XRMB : LA, LP, TBCL, TBCD, TTCL, TTCD
12 TVs with HPRC : LA, LP, TBCL, TBCD, TTCL, TTCD, JA, TMCL, TMCD, Periodicity, Aperiodicity, Pitch (normalized to 0 to 1 range)

Python command line usage:

usage: run_SSL_SI_pipeline.py [-h] [-m MODEL] [-f FEATS] [-i PATH]
                              [-o OUT_FORMAT]

Run the SI pipeline

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        set which SI system to run, xrmb trained (xrmb) or
                        hprc trained (hprc)
  -f FEATS, --feats FEATS
                        set which SSL pretrained model to be used to extract
                        features, hubert to use HuBERT-large and wavlm to use
                        wavLM-large pretrained models
  -i PATH, --path PATH  path to directory with audio files
  -o OUT_FORMAT, --out_format OUT_FORMAT
                        output TV file format (mat or npy)

Example for running the ML pipeline

Run the pipeline from end to end (executes all 3 steps) to extract features

python 1_run_SSL_SI_pipeline_custom.py -m xrmb -f hubert -c path/to/csv/file -o 'npy'

Put all subfolder to one folder, update folder location in script line 73 and reconstruct the folder

python 1_folder_restructur.py

collect feature data

python 2_data_collection.py /path/to/npy/folder /path/to/testclean/folder ɪ_ʃ_t ɪ_s_t i_t͡ʃ_t

plot

python 3_plot_multip.py path/to/csv/from/2nd/step <phoneme_combinations>

gaussian

python 4a_gaussian.py --train-csv xx.csv --eval-csv xxx.csv --phoneme-combinations xxx --threshold optional --load_model xxoptional

you can adjust threshold to control the sensitiveness, you can also use load_model to use existing model. if no eval csv then the script will just generate gaussian models.

ignore context example: ʃ s t͡ʃ z θ ð ʒ ; with context example: ɪ_ʃ_t ɪ_s_t i_t͡ʃ_t

License

This project is licensed under the LICENSE-CC-BY-NC-ND-4.0 - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
examples		examples
model_train_XRMB		model_train_XRMB
saved_models		saved_models
test_audio		test_audio
.gitignore		.gitignore
1_folder_restructur.py		1_folder_restructur.py
1_run_SSL_SI_pipeline_custom.py		1_run_SSL_SI_pipeline_custom.py
2_data_collection.py		2_data_collection.py
2_data_collection_ignore_context.py		2_data_collection_ignore_context.py
3_plot_multip.py		3_plot_multip.py
3_plot_multip_ignore_context.py		3_plot_multip_ignore_context.py
4_gaussian.py		4_gaussian.py
4_gaussian_ignore_context.py		4_gaussian_ignore_context.py
KalmanSmoother.py		KalmanSmoother.py
LICENSE-CC-BY-NC-ND-4.0.md		LICENSE-CC-BY-NC-ND-4.0.md
README.md		README.md
feature_extracter.py		feature_extracter.py
requirements.txt		requirements.txt
run_instructions.txt		run_instructions.txt
run_saved_model.py		run_saved_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSL-SI-tool-w_plot_gaussiam

Description

Installation Guide

Run SI tool pipeline

Tract Variables Output

Python command line usage:

Example for running the ML pipeline

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SSL-SI-tool-w_plot_gaussiam

Description

Installation Guide

Run SI tool pipeline

Tract Variables Output

Python command line usage:

Example for running the ML pipeline

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages