GitHub - sidhucode/TherAIpy

# TherAIpy

**TherAIpy** is a therapy and counseling web application with real-time voice interaction, designed to provide AI-powered therapy sessions. The platform emphasizes spoken responses over traditional chat, creating a natural and immersive therapy experience.

---

## Table of Contents

1. [Overview](#overview)
2. [Features](#features)
3. [Tech Stack](#tech-stack)
4. [Project Structure](#project-structure)
5. [Setup & Installation](#setup--installation)
6. [API Endpoints](#api-endpoints)
7. [Usage](#usage)
8. [Future Work](#future-work)
9. [Privacy & Security](#privacy--security)
10. [Contributing](#contributing)
11. [License](#license)

---

## Overview

TherAIpy delivers AI-powered therapy with an **audio-first approach**, offering users the ability to converse naturally with a virtual therapist. Unlike typical GPT wrappers, the platform focuses on generating **spoken responses** while maintaining the flexibility to integrate text-based chat, avatars, and session management.

---

## Features

### ✅ Implemented
- **Text-to-Speech (TTS)**
  - Kokoro 82M parameter model
  - Python TTS engine with automatic model loading
  - 24kHz WAV audio output
  - Direct streaming to browser
  - Automatic cleanup of temporary files (5-minute retention)

- **Speech-to-Text (STT)**
  - Audio file uploads supported (webm/opus)
  - Returns transcription with confidence score
  - Ready for integration with Whisper or Google Cloud

- **AI Chat/Therapy**
  - CBT-based system prompt
    (In-Progress)
    - Context-aware responses with message history
    - Session management support

- **Voice Interface & Frontend Components**
  - Browser microphone recording with noise suppression & echo cancellation
  - VoiceInterface, MicButton, Captions, PrivacyBanner, Avatar placeholder
  - Real-time audio loop: Record → STT → Chat → TTS → Playback

- **Project Infrastructure**
  - Organized directory structure (`python/`, `temp/`, `app/`)
  - Python virtual environment (3.12.6)
  - Git ignore for audio files
  - Cleanup scripts for maintenance

### ⚠️ Partially Implemented
- Avatar generation API structure (ready for D-ID/HeyGen integration)

### Future Work
- Facial Recognition / Emotional Sentiment Analysis
- GPT-4 integration for therapy responses
- Real-time avatar video generation
- Advanced streaming, WebSocket communication, session persistence
- User authentication, therapy progress tracking, analytics, multi-language support, emotion detection, voice cloning
- Security: end-to-end encryption, HIPAA compliance, audit logging

---

## Tech Stack

- **Frontend:** Next.js 13.5.3, React 18, TypeScript, TailwindCSS
- **Backend:** Next.js API Routes
- **TTS Engine:** Kokoro (Python 3.12.6)
- **Environment:** macOS, Node.js, Python virtual environment

---

## Project Structure

TherAIpy/ ├── app/ │ ├── api/ │ │ ├── tts/ # Fully functional │ │ ├── stt/ # Mock implementation │ │ ├── chat/ # Mock implementation │ │ └── avatar/ # Placeholder ├── components/ │ ├── VoiceInterface │ ├── MicButton │ ├── Captions │ ├── Avatar │ └── PrivacyBanner ├── python/ │ ├── tts_engine.py │ └── cleanup.py ├── temp/audio/ # Temporary audio storage └── .venv/ # Python virtual environment


---

## Setup & Installation

### 1. Clone and Navigate
```bash
git clone <repository-url>
cd TherAIpy

2. Python Environment

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r python/requirements.txt

3. Node.js Dependencies

npm install

4. Run Development Server

npm run dev

The first TTS request automatically downloads the Kokoro model (~300MB).

API Endpoints

Endpoint	Description	Status
POST `/api/tts`	Generate speech from text	✅ Working
POST `/api/stt`	Transcribe audio to text	⚠️ Mock
POST `/api/chat`	Generate therapy response	⚠️ Mock
POST `/api/avatar`	Generate avatar/video	❌ Placeholder

Usage

Open the web app in a browser
Use the MicButton to record voice
Audio is sent to STT → Chat → TTS → playback automatically
Captions display transcription and response text in real time

Future Work

Integrate real STT models (Whisper/Google)
Integrate GPT-4 or other LLMs for therapy responses
Add avatar/video generation
Real-time streaming via WebSockets
Session persistence, user authentication, progress tracking
Advanced features: multi-language support, emotion detection, voice customization
Security: HIPAA compliance, end-to-end encryption, audit logging

Privacy & Security

Temporary audio files auto-delete (configurable retention)
No persistent storage of user audio in main directories
.gitignore prevents accidental commit of audio files
Privacy banner notifies users of data handling

Contributing

Clone the repository and follow the setup instructions
Use feature branches for new functionality
Submit PRs with clear descriptions of changes
Ensure all audio or sensitive data is excluded from commits

License

This project is MIT licensed. See LICENSE for details.

https://github.com/alphacep/vosk-api

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
UI-PagesTUFF		UI-PagesTUFF
app		app
face emtion detector/app		face emtion detector/app
python		python
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
GROQ_SETUP.md		GROQ_SETUP.md
INSTALLATION.md		INSTALLATION.md
README.md		README.md
SETUP.md		SETUP.md
index.html		index.html
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
run-dev.sh		run-dev.sh
server.py		server.py
tailwind.config.js		tailwind.config.js
test.py		test.py
test.txt		test.txt
test_tts_api.js		test_tts_api.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2. Python Environment

3. Node.js Dependencies

4. Run Development Server

API Endpoints

Usage

Future Work

Privacy & Security

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

2. Python Environment

3. Node.js Dependencies

4. Run Development Server

API Endpoints

Usage

Future Work

Privacy & Security

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages