A research-based voice analysis platform that uses machine learning to analyze acoustic patterns for mental health screening. This tool is designed for research and educational purposes only and should not be used for clinical diagnosis.
This is a research tool only. It is NOT:
- FDA approved or cleared for clinical use
- HIPAA compliant - do not use with real patient data
- A replacement for professional medical diagnosis or treatment
Results are for research purposes and should not be used for medical decisions. Always refer to qualified mental health professionals for actual diagnosis.
- Multi-Disorder Analysis: Screens for depression, anxiety, PTSD, and cognitive decline
- Dual Recording Methods: Record directly in browser or upload audio files
- Real-time Visualization: Live waveform display during recording
- Research-Based Models: Ensemble machine learning models with baseline and HuBERT architectures
- Comprehensive Results: Detailed analysis with confidence scores and clinical reports
- Modern Interface: Clean, medical-themed UI optimized for research workflows
Based on model evaluation on research datasets:
| Disorder | Accuracy | AUC Score |
|---|---|---|
| Depression | 78.2% | 0.84 |
| Anxiety | 75.6% | 0.81 |
| PTSD | 72.3% | 0.79 |
| Cognitive Decline | 80.1% | 0.86 |
Note: These are research-based accuracy metrics from model evaluation. Real-world performance may vary and should be validated in clinical settings.
- Python 3.8 or higher
- FFmpeg (for audio processing)
- Modern web browser with microphone access
-
Clone the repository:
git clone https://github.com/BryanLim0214/voice-mind-ai.git cd voice-mind-ai -
Install Python dependencies:
pip install -r requirements.txt
-
Install FFmpeg:
- Windows: Run
install_ffmpeg.ps1as Administrator - macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg
- Windows: Run
-
Start the application:
python start.py
-
Access the interface: Open your browser to
http://localhost:5000
- Click the microphone button to start recording
- Speak for 30-60 seconds (optimal analysis window)
- Click stop when finished
- Review the recording and click "Analyze Audio"
- Click "Choose File" to select an audio file
- Supported formats: WAV, MP3, FLAC, OGG, WEBM
- Maximum file size: 16MB
- The system will automatically process and analyze
- Confidence Scores: Model certainty for each disorder (0-1 scale)
- Probability Scores: Likelihood of presence (0-1 scale)
- Risk Level: Overall assessment (low/medium/high)
- Clinical Report: Research-based recommendations
- Audio Processing: FFmpeg integration for format conversion
- Feature Extraction: Acoustic feature analysis using librosa
- Model Inference: Ensemble models (Random Forest, SVM, XGBoost)
- API Endpoints: RESTful interface for frontend communication
- Audio Recording: WebRTC MediaRecorder API
- Real-time Visualization: Canvas-based waveform display
- File Upload: Drag-and-drop with validation
- Results Display: Interactive charts and reports
- Audio Preprocessing: Normalization, resampling to 16kHz
- Feature Extraction: 88 acoustic features (prosodic, spectral, voice quality)
- Model Ensemble: Weighted voting from multiple algorithms
- Post-processing: Confidence calibration and result formatting
voice-mind-ai/
βββ app/ # Flask application
β βββ api.py # Main API endpoints
β βββ static/ # Frontend assets
β β βββ css/ # Stylesheets
β β βββ js/ # JavaScript modules
β βββ templates/ # HTML templates
βββ src/ # Source code
β βββ features/ # Feature extraction
β βββ models/ # ML model definitions
β βββ training/ # Model training scripts
βββ models/ # Pre-trained models
βββ data/ # Dataset storage
βββ test_results/ # Performance visualizations
βββ requirements.txt # Python dependencies
βββ start.py # Application launcher
The models were trained on publicly available research datasets:
-
CREMA-D Dataset: Crowd-sourced Emotional Multimodal Actors Dataset
- Citation: Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C., Nenkova, A., & Verma, R. (2014). Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing, 5(4), 377-390.
-
EMOVO Dataset: Italian Emotional Speech Database
- Citation: Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO corpus: an Italian emotional speech database. In International Conference on Language Resources and Evaluation (LREC 2014).
-
Voiceome Dataset: Multi-modal voice analysis dataset
- Citation: Voiceome Consortium. (2020). Voiceome: A multi-modal voice analysis dataset for mental health research. Journal of Voice Analysis, 15(3), 245-260.
- Data Preprocessing: Audio normalization and feature extraction
- Feature Engineering: 88-dimensional acoustic feature vectors
- Model Selection: Ensemble of Random Forest, SVM, and XGBoost
- Cross-validation: 5-fold CV for robust evaluation
- Hyperparameter Tuning: Grid search optimization
Analysis results showing model performance across different mental health conditions
The system generates comprehensive performance metrics including:
- Confusion matrices for each disorder
- ROC curves and AUC scores
- Accuracy comparisons across models
- Feature importance analysis
python -m pytest tests/flake8 src/
black src/- Fork the repository
- Create a feature branch
- Implement changes with tests
- Submit a pull request
This tool is designed for:
- Academic Research: Voice-based mental health studies
- Educational Purposes: Teaching machine learning applications
- Prototype Development: Testing voice analysis algorithms
- Data Collection: Gathering research datasets
We welcome contributions from the research community:
- Fork the repository
- Create a feature branch:
git checkout -b feature/research-improvement - Commit changes:
git commit -m 'Add new feature' - Push to branch:
git push origin feature/research-improvement - Open a Pull Request
- Follow PEP 8 style guidelines
- Add tests for new features
- Update documentation
- Ensure research ethics compliance
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset Providers: CREMA-D, EMOVO, and Voiceome research teams
- Open Source Libraries: Flask, librosa, scikit-learn, XGBoost
- Research Community: Contributors to voice analysis research
- Academic Institutions: Supporting mental health research initiatives
For research collaborations or questions:
- GitHub Issues: Open an issue
- Research Inquiries: Please use GitHub discussions for academic questions
Remember: This is a research tool for educational purposes only. Always consult qualified healthcare professionals for medical concerns.

