An interactive web application featuring a 3D AI avatar with VISEME-based lip synchronization. The application provides an educational AI companion powered by Google Gemini API with natural conversation capabilities and synchronized facial animations.
- Gemini AI Integration - Powered by Google Gemini API for intelligent responses
- Real-time Chat Interface - Interactive conversation with AI companion
- VISEME-based Lip Synchronization - Mouth animation synchronized with audio playback
- 3D Avatar Rendering - ReadyPlayerMe avatar rendered with Three.js
- Dark/Light Theme - Persistent theme switching with localStorage
- Animation System - Greeting and idle animations with smooth transitions
- Audio Processing Tool - Generate VISEME data from audio files
- High Performance Rendering - 60 FPS 3D rendering
Frontend
- React 18
- Three.js & React Three Fiber
- @react-three/drei
- Vite (Build Tool)
- Leva (Debug Controls)
- Lucide React (Icons)
Backend
- Google Gemini API (AI Chat)
- Deployed Backend URL:
https://cyphers101.onrender.com/api/chat - Express.js (Server Framework)
Audio Processing
- wawa-lipsync (VISEME generation)
- Web Audio API
3D Assets
- ReadyPlayerMe Avatar (GLB format)
- FBX Animations (Idle, Greeting)
- VISEME morph targets
Ai-companion/
├── frontend/ # Main React Application
│ ├── src/
│ │ ├── components/
│ │ │ ├── Avatar.jsx # 3D avatar with VISEME lip-sync
│ │ │ ├── Experience.jsx # Three.js scene setup
│ │ │ ├── Sidebar.jsx # Chat history sidebar
│ │ │ ├── MainContent.jsx # Main content area
│ │ │ ├── AvatarSection.jsx # Avatar container with Canvas
│ │ │ └── ChatSection.jsx # Chat input interface
│ │ ├── contexts/
│ │ │ └── ThemeContext.jsx # Theme state management
│ │ ├── config/
│ │ │ └── api.js # API endpoint configuration
│ │ ├── App.jsx # Main application component
│ │ ├── App.css
│ │ ├── index.css
│ │ └── main.jsx
│ ├── public/
│ │ ├── models/ # GLB avatar files
│ │ ├── animations/ # FBX animation files
│ │ ├── audios/ # Audio files with VISEME JSON data
│ │ └── textures/ # Background images
│ ├── index.html
│ ├── vite.config.js
│ └── package.json
│
├── tools/ # Audio Processing Tools
│ ├── main.js # VISEME generator script
│ ├── index.html # Web interface for tool
│ ├── package.json
│ └── sample.mp3 # Sample audio file
│
├── docs/ # Project Documentation
│ ├── ARCHITECTURE.md # Architecture overview
│ ├── API.md # API documentation
│ └── demo.png # Demo screenshot
│
├── chat-demo.html # Standalone chat demo (no 3D avatar)
├── SETUP.md # Detailed setup instructions
├── PROJECT_SUMMARY.md # Project summary
├── GEMINI_API_INTEGRATION.md # Gemini API integration guide
├── VISEME_FORMAT.md # VISEME format specification
└── README.md # This file
- Node.js 18 or higher
- npm (comes with Node.js)
- Modern web browser (Chrome, Firefox, Safari, or Edge)
- Internet connection (for Gemini API calls)
This is the primary application featuring the 3D avatar with AI chat capabilities.
# Navigate to project root
cd Ai-companion
# Navigate to frontend directory
cd frontend
# Install dependencies
npm install
# Start development server
npm run devThe application will be available at: http://localhost:5173
The frontend connects to the deployed backend at: https://cyphers101.onrender.com/api/chat
Note: First request to the backend may take 30-60 seconds due to cold start on free hosting.
A simple HTML file to test the Gemini API chat functionality without the 3D avatar.
# From project root
cd Ai-companion
# Open in browser (macOS)
open chat-demo.html
# Or open manually in any browser
# File location: /Ai-companion/chat-demo.htmlThis demo provides a basic chat interface to verify API connectivity.
Tool for generating VISEME data from audio files for lip synchronization.
# Navigate to tools directory
cd Ai-companion/tools
# Install dependencies
npm install
# Start the tool
npm run devThe tool will be available at: http://localhost:5173
Usage:
- Open the tool in your browser
- Upload an audio file (MP3, WAV, or OGG)
- The tool generates a JSON file with VISEME cue data
- Place both audio file and JSON in
frontend/public/audios/
To create a production build of the frontend:
cd frontend
npm run buildThe production-ready files will be in the frontend/dist/ directory.
The application uses VISEME (Visual Phoneme) codes for lip synchronization:
- Audio Input - Pre-recorded audio files or TTS output
- VISEME Processing - Audio is processed by wawa-lipsync tool
- JSON Generation - Tool generates timestamped VISEME codes
- 3D Animation - Avatar's morph targets are updated in real-time
- Synchronized Playback - Mouth movements match audio playback
The system uses standard VISEME mouth positions based on phoneme shapes:
- A-H: Various vowel and consonant mouth shapes
- I-U: Extended mouth positions
- X: Silence or neutral position
Each VISEME corresponds to specific phonemes and is rendered using 3D morph targets on the avatar model.
- User enters message in chat interface
- Frontend sends POST request to backend API
- Backend forwards request to Google Gemini API
- Gemini processes the message and generates response
- Response is returned to frontend and displayed
- Avatar can play pre-recorded audio responses with lip-sync
- Enable/disable toggle for avatar rendering
- Visual status indicators
- Greeting animation on activation
- Idle animation during inactivity
- Smooth state transitions
- Real-time message display
- Typing indicators during AI processing
- Error handling with user-friendly messages
- Message history display
- Responsive layout
- Dark and light modes
- Persistent theme selection via localStorage
- Smooth theme transitions
- Consistent styling across components
- Greeting animation on load
- Idle animation loop
- VISEME-based lip-sync animation
- FBX animation file support
- GLB model support with morph targets
Start a conversation with the AI companion. Example queries:
- "What is artificial intelligence?"
- "Explain quantum physics"
- "How do I learn programming?"
- "What are good study techniques?"
- Toggle avatar on/off using the sidebar control
- Observe greeting animation when avatar is enabled
- Watch idle animation during inactivity
- Use Leva debug panel (development mode) to control animations
The application includes sample audio files with VISEME data:
welcome.mp3/welcome.json
Use the Leva debug controls in development mode to trigger audio playback and observe lip synchronization.
To add new audio files with lip synchronization:
-
Generate VISEME data using the tools:
cd tools npm install npm run dev -
Upload your audio file through the web interface
-
Download the generated JSON file
-
Place both files in
frontend/public/audios/:frontend/public/audios/ ├── your-audio.mp3 └── your-audio.json -
Update the Avatar component to reference the new audio file
To use a different 3D avatar:
- Create or obtain a GLB format avatar from ReadyPlayerMe
- Ensure the model has VISEME morph targets (A, B, C, D, E, F, G, H, X, etc.)
- Place the GLB file in
frontend/public/models/ - Update the model path in
Avatar.jsx - Test all VISEME animations
Frontend Architecture:
- React for UI components
- Three.js for 3D rendering via React Three Fiber
- Context API for theme management
- Vite for fast development and building
Backend Architecture:
- Express.js server acts as proxy to Gemini API
- Deployed on Render (free tier with cold starts)
- No database or session management
- Stateless HTTP REST API
Communication Flow:
User Input → Frontend → Backend (Express) → Gemini API
← Response ← ←
- Target 60 FPS for 3D rendering
- Lip-sync latency under 50ms
- Initial backend response may take 30-60 seconds (cold start)
- Subsequent responses typically 1-3 seconds
- 3D model loading time approximately 2 seconds
- Backend hosted on free tier (cold start delays)
- No WebSocket support (HTTP polling only)
- No real-time TTS integration
- Pre-generated VISEME data required for lip-sync
- No authentication or user management
- Single avatar model per session
Potential improvements for future development:
Phase 1: Real-time Features
- WebSocket integration for live updates
- Real-time VISEME generation
- ElevenLabs TTS integration for dynamic audio
Phase 2: Advanced AI
- Voice input (Speech-to-Text)
- Context-aware conversation memory
- Emotion detection and responses
- Multiple AI personality options
Phase 3: User Experience
- Multiple avatar model selection
- Custom avatar upload
- Mobile responsive design improvements
- Session recording and playback
Additional documentation available:
SETUP.md- Detailed setup instructionsdocs/API.md- API endpoint documentationdocs/ARCHITECTURE.md- System architecture detailsGEMINI_API_INTEGRATION.md- Gemini API integration guideVISEME_FORMAT.md- VISEME format specification
MIT License
Open Source Libraries:
- Three.js - 3D rendering engine
- React Three Fiber - React renderer for Three.js
- @react-three/drei - Three.js helpers
- wawa-lipsync - VISEME generation
- Google Generative AI - Gemini API client
Assets:
- ReadyPlayerMe - Avatar creation platform
- Mixamo - Animation library (FBX files)
References:
- Oculus VISEME specification
- Three.js documentation
- React Three Fiber documentation
