A Python voice assistant with pure MCP (Model Context Protocol) architecture for maximum extensibility.
PyHi is a voice-controlled AI assistant that combines:
- Wake word detection ("Hey Chat")
- Speech-to-text recognition
- AI conversation with OpenAI/Anthropic
- Tool calling through MCP servers only
- Text-to-speech responses
The system is designed for simplicity and extensibility - add new capabilities by creating simple MCP servers.
git clone [email protected]:m0nkmaster/pyhi.git
cd pyhi
pip install -r requirements.txtCreate .env file:
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key # Optional
PICOVOICE_API_KEY=your_picovoice_key
TOMORROW_IO_API_KEY=your_weather_key # Optional for weather server
WATCHMODE_API_KEY=your_watchmode_key # Optional for streaming serverCopy and customize the configuration:
cp config.yaml my_config.yaml
# Edit my_config.yaml as neededpython -m src.appSay "Hey Chat" followed by your command!
- "What's the weather in London?"
- "How's the weather today?"
- "Tell me about the weather in Tokyo"
- "Set a timer for 5 minutes"
- "Set an alarm for 2:30 PM"
- "List my alarms"
- "What trains leave from London Paddington?"
- "Show departures from Manchester"
- "Find trains to Birmingham"
- "Add meeting tomorrow at 3 PM"
- "What's on my calendar today?"
- "Schedule lunch with John for Friday"
- "Where can I watch Inception?"
- "Find streaming options for The Office"
- "What's available on Netflix?"
- "What time is it?"
- "Tell me a joke"
- "How are you today?"
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Wake Word βββββΆβ Audio Handler βββββΆβ Speech-to-Text β
β Detection β β (Unified) β β Recognition β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Porcupine β β PyAudio + β β Google Speech β
β Detection β β Speech Rec β β Recognition β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Audio ββββββ Text-to-Speech ββββββ AI Processing β
β Playback β β Generation β β & MCP Tools β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Unified β β OpenAI TTS β β OpenAI/Anthropicβ
β Audio Handler β β API β β + MCP Servers β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
src/
βββ app.py # Main VoiceAssistant class
βββ config.py # Unified configuration system
βββ audio.py # Unified audio handler
βββ wake_word.py # Wake word detection
βββ mcp_manager.py # MCP server management
βββ conversation/ # AI conversation management
β βββ ai_client.py # OpenAI/Anthropic API client
β βββ manager.py # Conversation state management
βββ mcp_servers/ # MCP servers (extensible)
β βββ weather/ # Weather information
β βββ alarms/ # Timers and alarms
β βββ train_times/ # UK train departures
β βββ calendar/ # Google Calendar integration
β βββ streaming/ # Streaming service search
βββ utils/ # Utility functions
βββ assets/ # Audio files and models
- Single Config class with YAML support
- Environment variable expansion
- Platform-specific auto-detection
- Clean, organized settings structure
- AudioHandler: Unified recording, playback, and speech recognition
- WakeWordDetector: Porcupine integration with async interface
- Async-first design for responsive interaction
All functionality is provided through MCP servers:
- Clean, single-system approach
- 5 complete servers ready to use
- Easy server addition via configuration
- Current weather conditions
- Weather forecasts
- Location-based weather data
- Tomorrow.io API integration
- Set timers and alarms
- List active alarms
- Cancel alarms
- Audio notifications
- UK train departure information
- Station code search
- Live departure boards
- LDBWS API integration
- Google Calendar integration
- Add/delete events
- View upcoming events
- Service account authentication
- Movie/TV show search
- Streaming availability
- UK-focused results
- Watchmode API integration
mkdir src/mcp_servers/my_server
touch src/mcp_servers/my_server/__init__.py#!/usr/bin/env python3
from mcp.server import FastMCP
from pydantic import BaseModel, Field
mcp = FastMCP("my-server")
@mcp.tool()
async def my_function(param: str) -> str:
"""My custom function description."""
return f"Result: {param}"
@mcp.resource("my-data://items")
async def get_items() -> str:
"""Get available items."""
return json.dumps({"items": ["a", "b", "c"]})
if __name__ == "__main__":
mcp.run("stdio")mcp:
servers:
- name: "my-server"
command: ["python", "-m", "src.mcp_servers.my_server"]
enabled: truemcp:
servers:
- name: "github"
command: ["npx", "@modelcontextprotocol/server-github"]
- name: "filesystem"
command: ["python", "-m", "mcp_server_filesystem"]# Audio system settings
audio:
input_device: "default"
output_device: "default"
sample_rate: 16000
channels: 1
chunk_size: 1024
speech_threshold: 200.0
silence_duration: 2.0
activation_sound: "bing.mp3"
confirmation_sound: "elevator.mp3"
ready_sound: "beep.mp3"
sleep_sound: "bing-bong.mp3"
# AI provider configuration
ai:
provider: "openai" # "openai" or "anthropic"
model: "gpt-4o-mini"
api_key: "${OPENAI_API_KEY}"
voice: "nova"
voice_model: "tts-1"
max_tokens: 250
temperature: 0.7
# MCP server configuration
mcp:
enabled: true
transport: "stdio"
timeout: 30
servers:
- name: "weather"
command: ["python", "-m", "src.mcp_servers.weather"]
enabled: true
# ... other servers
# Wake word detection
wake_word:
phrase: "Hey Chat"
model_path: "" # Auto-detected
# Application settings
timeout_seconds: 10.0
debug: false# Required
OPENAI_API_KEY=your_openai_key
PICOVOICE_API_KEY=your_picovoice_key
# Optional
ANTHROPIC_API_KEY=your_anthropic_key
TOMORROW_IO_API_KEY=your_weather_key
WATCHMODE_API_KEY=your_streaming_key- β Single extension system (MCP only)
- β Unified configuration (YAML with env vars)
- β Efficient audio system (streamlined implementation)
- β Clean file structure (organized by purpose)
- β Pure MCP architecture (standardized protocols)
- β Easy to understand - clear component responsibilities
- β Simple to extend - standard MCP server creation
- β Well documented - comprehensive examples
- β Clean APIs - async-first design
- β Type safe - Pydantic models throughout
- β Reduced complexity - single system to maintain
- β Clear error handling - structured exceptions
- β Easy testing - modular components
- β Standard protocols - MCP for all extensions
# Core functionality
openai>=1.5.0
mcp>=1.0.0
pydantic>=2.0.0
pyyaml>=6.0.0
# AI providers
anthropic>=0.25.0
# Audio processing
pyaudio>=0.2.14
SpeechRecognition>=3.10.0
pydub>=0.25.0
pvporcupine>=3.0.0
# HTTP client
httpx>=0.25.0
# Utilities
python-dotenv>=1.0.0
numpy>=1.21.0- macOS: Full support (development platform)
- Linux: General support
- Raspberry Pi: Optimized for edge deployment
- Windows: Basic support
- Microphone: Any USB/built-in microphone
- Speakers: Audio output device
- CPU: Modern processor for speech processing
- RAM: 1GB+ for AI model processing
- Network: Internet connection for AI APIs
git clone <repository>
cd pyhi
pip install -r requirements.txt
pip install -r requirements-test.txt # For developmentpytest # All tests
pytest --cov=src # With coverage
pytest tests/test_mcp.py # Specific testsruff check src/ # Linting
ruff format src/ # Formatting
mypy src/ # Type checking# Run the assistant
python -m src.app
# Test individual MCP servers
python -m src.mcp_servers.weather
python -m src.mcp_servers.alarms
# Load custom config
python -m src.app --config my_config.yaml- Start PyHi:
python -m src.app - Say "Hey Chat" (wait for confirmation sound)
- Ask: "What's the weather like today?"
- Listen to response
- Continue conversation or wait for timeout
# Copy default config
cp config.yaml my_setup.yaml
# Edit settings
vim my_setup.yaml
# Run with custom config
python -m src.app --config my_setup.yaml# Create new server
mkdir src/mcp_servers/my_service
cd src/mcp_servers/my_service
# Implement server (see examples above)
vim __main__.py
# Test server independently
python __main__.py
# Add to config and restart PyHiCLAUDE.md: Development instructions and architecture notesARCHITECTURE.md: Detailed architecture analysisconfig.yaml: Complete configuration reference with comments
Each MCP server includes:
- Tool definitions with parameter validation
- Resource endpoints for data access
- Prompt templates for AI interaction
- Error handling with structured responses
- New MCP Servers - Add functionality through standard MCP protocol
- Platform Support - Improve Windows/Linux compatibility
- Audio Enhancements - Better speech recognition, noise cancellation
- Documentation - Usage examples, tutorials, troubleshooting
- Testing - Expand test coverage, integration tests
- Fork the repository
- Create a feature branch
- Implement changes with tests
- Ensure code quality (ruff, mypy)
- Submit pull request with description
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: This README and inline documentation
- OpenAI for ChatGPT and TTS APIs
- Anthropic for Claude chat completions
- Picovoice for Porcupine wake word engine
- MCP Community for the Model Context Protocol standard
- PyAudio contributors and all other open-source dependencies
PyHi - A voice assistant that's simple to understand, easy to extend, and powerful through standardized MCP servers.
Built with β€οΈ for developers who value simplicity and extensibility.
