This repository accompanies the publication "Mental Wellbeing at Sea: a Prototype to Collect Speech Data in Maritime Settings" at HEALTHINF 2025:
Pascal Hecker, Monica Gonzalez-Machorro, Hesam Sagha, Saumya Dudeja, Matthias Kahlau, Florian Eyben, Bjorn W. Schuller, Bert Arnrich
In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 29-40
ISBN: 978-989-758-731-3; ISSN: 2184-4305
Further, it contains the material for the publication under review "Mental Wellbeing at Sea: Active and Passive Speech Monitoring in a Maritime Setting".
The associated material and its own README can be found under passive_recordings/.
The figures presented in the publication are residing in the figures/ folder, which was mostly composed with the Jupyter notebook paper-plots_survey_responses.ipynb.
The central table with the significantly correlating features is paper-significantly_correlating_features.csv.
The central table for the statistical modelling approaches is called compiled-merged_denoised_noisy-paper.ods and it is being converted to the LaTeX table in the publication with the Jupyter notebook paper-compose_main_modelling_table.ipynb.
For the extended journal publication, the classification performance is provided in session-level aggregates (in the HEALTHINF publication, the results were given for segment-level classification). The notebook paper-collect_results_for_expanded_main_table.ipynb automates the collection of the best-performing models and stores the result as compiled-merged_denoised_noisy-paper-proper_loso-expanded.csv. Then, the notebook paper-compose_main_modelling_table-proper_loso-expanded-session_level.ipynb takes that compiled-merged_denoised_noisy-paper-proper_loso-expanded.csv file and converts it into the LaTeX source used in the publication. Further, the regression scatter plot of the best-performing model with session-level performance is being generated in evaluate_session_level_ccc-scatterplot.py and the resulting plot is saved as regression_who_5_percentage_score_corrected_noisy_eGeMAPSv02-publication.pdf.
The additional analyses of the active speech data modelling are described in the section Session-level evaluation.
The additional analyses of the passively collected speech data are outlined in the sections 4. Confounder Analysis: Noise and Denoising and 5. Mediation Analysis , as well as the points 6. Mediation analysis (wind -> emotion -> stress) and 7. Generate mediation path diagram in the Usage section in the passive_recordings/ subdirectory.
Unfortunately, we cannot share the data due to privacy constraints. The source in this repository was used to run the analyses presented in the publication and should provide some valuable means to check the routines applied.
To utilise the source code provided in this repository, preferably use a virtual environment manager of your choice and run pip install -r requirements-freeze-devaice.txt.
devAIce is a commercial framework that provides the voice activity detection (VAD) and the signal-to-noise ratio (SNR) prediction in this study.
Without a respective devAIce license, you can run pip install -r requirements-freeze.txt and have to implement alternative solutions for its functionalities.
Python version 3.8.10 was used in this project.
Synthetic data generators are provided so that the full pipeline can be executed
without access to the private dataset. See synthetic_data/README.md for details.
# 1. Generate synthetic active and passive data
python synthetic_data/active/generate_synthetic_data.py --n-participants 15 --seed 42
python synthetic_data/passive/generate_synthetic_data.py --n-days 40 --seed 42 --skip-audio
# 2. Run the active speech modelling pipeline
python src/main.py src/experiment_configs/synthetic/synthetic-eGeMAPSv02.yaml
# 3. Collect results into CSV
python src/collect_results.py
# 4. Compute bootstrap confidence intervals and collect best-performing models
# Run the following Jupyter notebooks:
notebooks/mwas/synthetic-bootstrapping-bulk_apply_confidence_intervals_to_results.ipynb
notebooks/mwas/synthetic-paper-collect_results_for_expanded_main_table.ipynb
# 5. Session-level evaluation and retrospective label validation
python notebooks/mwas/evaluate_session_level_ccc.py
python notebooks/mwas/validate_retrospective_labels.py
# Note: evaluate_session_level_ccc-scatterplot.py targets the specific best-performing
# model configuration from the paper and is not run with synthetic data.
# 6. Compose LaTeX table
notebooks/mwas/synthetic-paper-compose_main_modelling_table-session_level.ipynb
# 7. Passive pipeline evaluation (run from repository root)
python passive_recordings/src/evaluate_results/evaluate_time_course_main.py
python passive_recordings/src/evaluate_results/confounder_noise_denoising_main.py
python passive_recordings/src/evaluate_results/mediation_analysis_main.py --quick
python passive_recordings/src/evaluate_results/mediation_diagram.py \
--input synthetic_data/passive/data/evaluated/synthetic-mediation-wind_emotion_stress/mediation_results.yaml \
--output synthetic_data/passive/data/evaluated/synthetic-mediation-wind_emotion_stress/causal_diagram_pgf.pdfSeveral scripts have their hardcoded paths adjusted to point to the synthetic
data directories. The original paths are preserved as comments. Synthetic
results are written to results/synthetic/ and
synthetic_data/passive/data/evaluated/ so that no original results under
results/mwas/ or passive_recordings/data/evaluated/ are overwritten.
The --quick flag on mediation_analysis_main.py reduces the bootstrap
iterations to 10 for fast verification. The upstream audio compression and
prediction scripts (compress_extracted_files/, process_and_predict/)
require real VDR audio and additional dependencies (sox) and are therefore
not covered by the synthetic data pipeline.
Note: Results from synthetic data have no scientific meaning. The synthetic labels are random and unrelated to the audio content.
The main modelling pipeline launcher script resides in src/main.py.
Simply launch it by running python main.py.
In src/experiment_configs/, you can find designated configuration files for particular experiment runs.
A config can be passed through the command line to the main script, such as:
python main.py experiment_configs/mental_wellbeing_at_sea/eGeMAPSv02-target_norm-no_denoising.yaml.
The section experiment configs used lists all the configuration files that were used to obtain the results presented in the publication.
The results will be saved in a results/mwas/modelling folder and contain a nested folder structure that encodes the respective experiment settings.
After several experiment runs, you can adjust and execute src/collect_results.py.
It will find all results-compiled.yaml files and add its evaluation metrics to a .csv file saved in results/mwas/composed.
In the script, you have to manually set the target variable, whose results you want to collect. You can further filter any string that is contained in the results paths and specifies the respective run (e.g., "type-no_feature_selection" for all models for which no feature selection was performed). This is implemented in the search_term = {"term": None, "name": "everything"} dictionary, where term would be "type-no_feature_selection" and "name" can be chosen by you to be a recognizable identifier in the results/mwas/composed folder hierarchy.
With that .csv file, you can then e.g., open it in LibriOffice, select everything (ctrl + a) → "Data" → "Sort" → "Sorty Key 1" = the column with the metric you find most meaningful, e,g, "Column B" for CCC, select "Descending".
That way, you get all your models sorted by their performance!
Then, select your best performing model, or any other model you want to inspect, and scroll to the "path" column. Use that path to navigate, (e.g., using cd from results/mwas/modelling/) to the model directory and check out the plots in the folder for regression plots of the train or test partition prediction.
Alternatively, the notebook paper-collect_results_for_expanded_main_table.ipynb automates the collection of the best-performing models and stores the result as compiled-merged_denoised_noisy-paper-proper_loso-expanded.
Each model result folder contains a data folder. That folder in turn contains df_results_train.parquet.zstd and df_results_test.parquet.zstd. These parquet files (for good compression) can be read with:
df = pd.read_parquet('df_results_test.parquet.zstd', engine='pyarrow')
and pyarrow will be installed already through the requirements.
These results DataFrames contain the predicted and ground truth labels, as well as several other useful columns such as the speaker ID and the index of the outer CV fold.
If more in-depth debugging is required, the following option in the experiment configuration saves even further data:
ModelTrainer:
meta:
save_full_data: True
This will also save the filtered feature- and label DataFrame - to check e.g., how many feature columns were dropped through feature selection or how the feature values were normalized.
The Jupyter notebook bootstrapping-bulk_apply_confidence_intervals_to_results.ipynb was used to calculate the confidence intervals with the confidence_intervals package.
To compare the performance of some denoising methods, a model to estimate the SNR level of the individual audio files was employed in audio_quality-snr_filtering.ipynb. In the publication, we discard files with an SNR value < 7. The respective files to filter out were copied to audio_quality-filter_samples.ipynb, and that notebook is processed in the main modelling pipeline in src/main.py#L317.
For denoising, the "causal speech enhancement model" (publication, repository) was employed, decoupled from this repository. The resulting file hierarchy was passed back to the pipeline by pointing the configuration file field path_data to the directory tree that was denoised; as an example, see the configuration file eGeMAPSv02-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml#8.
The Jupyter notebook check_clipping.ipynb was employed to check if the denoised files still contain clipping. No denoised files are clipped, but 17 "noisy" files showed clipping.
Two scripts in notebooks/mwas/ evaluate model performance at the session level (rather than the segment level used during training):
evaluate_session_level_ccc.py- Aggregates segment-level predictions to session level by averaging predictions within each (participant, session) pair, then computes CCC with 95% bootstrap confidence intervals for all 20 paper configurations. Validates against the paper's segment-level CCC and the existing session-level CCC (concordance_cc-test-agg-average).- The regression scatter plot of the best-performing model with session-level performance is being generated in
evaluate_session_level_ccc-scatterplot.pyand the resulting plot is saved asregression_who_5_percentage_score_corrected_noisy_eGeMAPSv02-publication.pdf.
- The regression scatter plot of the best-performing model with session-level performance is being generated in
validate_retrospective_labels.py- Compares session-level CCC on all sessions against assessment-only sessions (where a questionnaire was actually completed) for the 12 retrospective target configurations (WHO-5, PSS-10, PHQ-8). Verifies that retrospective label assignment does not inflate predictive performance.
Both scripts write their output to results/mwas/composed/session_level_analysis/.
The experiment configuration files used to run the modelling for the publication are:
eGeMAPS features
- No denoising: eGeMAPSv02-target_norm-no_denoising.yaml
- Denoising and SNR-based filtering: eGeMAPSv02-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
wav2vec2.0 embeddings as features
- No denoising: wav2vec2-large-robust-12-ft-emotion-msp-target_norm-dim-no_denoising.yaml
- Denoising and SNR-based filtering: wav2vec2-large-robust-12-ft-emotion-msp-dim-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
- No denoising: wav2vec2-large-robust-ft-libri-960h-target_norm-no_denoising.yaml
- Denoising and SNR-based filtering: wav2vec2-large-robust-ft-libri-960h-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
- No denoising: wav2vec2-large-xlsr-53-target_norm-no_denoising.yaml
- Denoising and SNR-based filtering: wav2vec2-large-xlsr-53-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
├── README.md <- The top-level README for developers using this project.
│
├── data
│ └── processed <- The final, canonical data sets for modeling.
│
│
├── notebooks <- Jupyter notebooks essential to this repository.
│ │
│ ├── data <- Data assisting the notebooks.
│ │
│ └── figures <- Generated graphics and figures used in the publication.
│
├── requirements-freeze.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
└── src <- Source code for use in this project.
│
├── data <- Scripts to download or generate data.
│
├── features <- Scripts to turn raw data into features for modeling.
│
└── models <- Scripts to train models and then use trained models to make
predictions.
Project based on the cookiecutter data science project template. #cookiecutterdatascience