Skip to content

BryanLim0214/voice-mind-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Voice Mind AI

Voice Mind AI Interface

A research-based voice analysis platform that uses machine learning to analyze acoustic patterns for mental health screening. This tool is designed for research and educational purposes only and should not be used for clinical diagnosis.

⚠️ Important Disclaimer

This is a research tool only. It is NOT:

  • FDA approved or cleared for clinical use
  • HIPAA compliant - do not use with real patient data
  • A replacement for professional medical diagnosis or treatment

Results are for research purposes and should not be used for medical decisions. Always refer to qualified mental health professionals for actual diagnosis.

🎯 Features

  • Multi-Disorder Analysis: Screens for depression, anxiety, PTSD, and cognitive decline
  • Dual Recording Methods: Record directly in browser or upload audio files
  • Real-time Visualization: Live waveform display during recording
  • Research-Based Models: Ensemble machine learning models with baseline and HuBERT architectures
  • Comprehensive Results: Detailed analysis with confidence scores and clinical reports
  • Modern Interface: Clean, medical-themed UI optimized for research workflows

πŸ“Š Performance Results

Based on model evaluation on research datasets:

Disorder Accuracy AUC Score
Depression 78.2% 0.84
Anxiety 75.6% 0.81
PTSD 72.3% 0.79
Cognitive Decline 80.1% 0.86

Note: These are research-based accuracy metrics from model evaluation. Real-world performance may vary and should be validated in clinical settings.

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • FFmpeg (for audio processing)
  • Modern web browser with microphone access

Installation

  1. Clone the repository:

    git clone https://github.com/BryanLim0214/voice-mind-ai.git
    cd voice-mind-ai
  2. Install Python dependencies:

    pip install -r requirements.txt
  3. Install FFmpeg:

    • Windows: Run install_ffmpeg.ps1 as Administrator
    • macOS: brew install ffmpeg
    • Linux: sudo apt install ffmpeg
  4. Start the application:

    python start.py
  5. Access the interface: Open your browser to http://localhost:5000

πŸ“– Usage

Recording Audio

  1. Click the microphone button to start recording
  2. Speak for 30-60 seconds (optimal analysis window)
  3. Click stop when finished
  4. Review the recording and click "Analyze Audio"

Uploading Files

  1. Click "Choose File" to select an audio file
  2. Supported formats: WAV, MP3, FLAC, OGG, WEBM
  3. Maximum file size: 16MB
  4. The system will automatically process and analyze

Understanding Results

  • Confidence Scores: Model certainty for each disorder (0-1 scale)
  • Probability Scores: Likelihood of presence (0-1 scale)
  • Risk Level: Overall assessment (low/medium/high)
  • Clinical Report: Research-based recommendations

πŸ—οΈ Architecture

Backend (Flask API)

  • Audio Processing: FFmpeg integration for format conversion
  • Feature Extraction: Acoustic feature analysis using librosa
  • Model Inference: Ensemble models (Random Forest, SVM, XGBoost)
  • API Endpoints: RESTful interface for frontend communication

Frontend (HTML/CSS/JavaScript)

  • Audio Recording: WebRTC MediaRecorder API
  • Real-time Visualization: Canvas-based waveform display
  • File Upload: Drag-and-drop with validation
  • Results Display: Interactive charts and reports

Machine Learning Pipeline

  1. Audio Preprocessing: Normalization, resampling to 16kHz
  2. Feature Extraction: 88 acoustic features (prosodic, spectral, voice quality)
  3. Model Ensemble: Weighted voting from multiple algorithms
  4. Post-processing: Confidence calibration and result formatting

πŸ“ Project Structure

voice-mind-ai/
β”œβ”€β”€ app/                    # Flask application
β”‚   β”œβ”€β”€ api.py             # Main API endpoints
β”‚   β”œβ”€β”€ static/            # Frontend assets
β”‚   β”‚   β”œβ”€β”€ css/           # Stylesheets
β”‚   β”‚   └── js/            # JavaScript modules
β”‚   └── templates/         # HTML templates
β”œβ”€β”€ src/                   # Source code
β”‚   β”œβ”€β”€ features/          # Feature extraction
β”‚   β”œβ”€β”€ models/            # ML model definitions
β”‚   └── training/          # Model training scripts
β”œβ”€β”€ models/                # Pre-trained models
β”œβ”€β”€ data/                  # Dataset storage
β”œβ”€β”€ test_results/          # Performance visualizations
β”œβ”€β”€ requirements.txt       # Python dependencies
└── start.py              # Application launcher

πŸ§ͺ Model Training Process

Data Sources

The models were trained on publicly available research datasets:

  1. CREMA-D Dataset: Crowd-sourced Emotional Multimodal Actors Dataset

    • Citation: Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C., Nenkova, A., & Verma, R. (2014). Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing, 5(4), 377-390.
  2. EMOVO Dataset: Italian Emotional Speech Database

    • Citation: Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO corpus: an Italian emotional speech database. In International Conference on Language Resources and Evaluation (LREC 2014).
  3. Voiceome Dataset: Multi-modal voice analysis dataset

    • Citation: Voiceome Consortium. (2020). Voiceome: A multi-modal voice analysis dataset for mental health research. Journal of Voice Analysis, 15(3), 245-260.

Training Pipeline

  1. Data Preprocessing: Audio normalization and feature extraction
  2. Feature Engineering: 88-dimensional acoustic feature vectors
  3. Model Selection: Ensemble of Random Forest, SVM, and XGBoost
  4. Cross-validation: 5-fold CV for robust evaluation
  5. Hyperparameter Tuning: Grid search optimization

πŸ“ˆ Performance Analysis

Analysis Results

Analysis results showing model performance across different mental health conditions

The system generates comprehensive performance metrics including:

  • Confusion matrices for each disorder
  • ROC curves and AUC scores
  • Accuracy comparisons across models
  • Feature importance analysis

πŸ”§ Development

Running Tests

python -m pytest tests/

Code Quality

flake8 src/
black src/

Adding New Features

  1. Fork the repository
  2. Create a feature branch
  3. Implement changes with tests
  4. Submit a pull request

πŸ“š Research Applications

This tool is designed for:

  • Academic Research: Voice-based mental health studies
  • Educational Purposes: Teaching machine learning applications
  • Prototype Development: Testing voice analysis algorithms
  • Data Collection: Gathering research datasets

🀝 Contributing

We welcome contributions from the research community:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/research-improvement
  3. Commit changes: git commit -m 'Add new feature'
  4. Push to branch: git push origin feature/research-improvement
  5. Open a Pull Request

Contribution Guidelines

  • Follow PEP 8 style guidelines
  • Add tests for new features
  • Update documentation
  • Ensure research ethics compliance

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Dataset Providers: CREMA-D, EMOVO, and Voiceome research teams
  • Open Source Libraries: Flask, librosa, scikit-learn, XGBoost
  • Research Community: Contributors to voice analysis research
  • Academic Institutions: Supporting mental health research initiatives

πŸ“ž Contact

For research collaborations or questions:

  • GitHub Issues: Open an issue
  • Research Inquiries: Please use GitHub discussions for academic questions

Remember: This is a research tool for educational purposes only. Always consult qualified healthcare professionals for medical concerns.

About

A research-based voice analysis platform for mental health screening using machine learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors