Project Replicant v2: Local AI with Voice, XR, and Text Interaction

🌐 Inspiration Project Replicant v2 is a complete rebuild of our original AI companion system. Inspired by sci-fi like SAO: Ordinal Scale, we wanted an AI agent that could seamlessly interact with users via voice, text, and XR environments. Starting from scratch allowed us to rethink the architecture, optimize performance, and fully embrace local processing for privacy and speed. ShellHacks provided the perfect environment to push these ideas into a working prototype.

💻 What It Does Project Replicant v2 delivers a rich, interactive AI experience:

Voice and Text Interaction: Speak naturally or type messages and receive context-aware responses.
Emotion-Aware Responses: The system detects and displays emotional context for more human-like interactions.
Multiple Agents: Users can converse with several intelligent Agents simultaneously, routed via Google Agent-to-Agent (A2A) architecture.
XR and Standard App Support: Works both in immersive AR/VR setups and standard desktop apps.
Fully Local Processing: High-performance LLM runs entirely on your device, ensuring offline capability and fast responses.
Real-Time STT/TTS Pipeline: Captures microphone input, transcribes it using Whisper, and speaks responses via Piper TTS.

🔧 How We Built It

Frontend: C# + OpenSilver chat system with dynamic chat bubbles, STT/TTS toggle, and microphone integration via NAudio.
Backend: FastAPI + WebSocket server streams audio, processes PCM input, and routes transcriptions to the local LLM or multiple Agents. Also LLamaSharp for Local LLMs (https://github.com/SciSharp/LLamaSharp)!
STT: Faster Whisper model handles CPU-based transcription, including stereo-to-mono conversion, resampling, and VAD.
TTS: PiperVoice synthesizes natural-sounding responses in real time.
Multi-Agent Handling: Conversations are routed through a Google A2A setup, enabling the LLM to talk to multiple Agents and provide context-aware responses.
Edge LLM: We used LFM2-2.6B-Q4_K_M.gguf from https://huggingface.co/LiquidAI/LFM2-2.6B-GGUF; An Excellent Choice For Powerful Local LLMs.
XR Ready: Fully compatible with XR frameworks, allowing future expansion into immersive environments.

🚧 Challenges We Ran Into

Rewriting everything from scratch while maintaining feature parity and improving performance.
Optimizing real-time audio processing pipelines for STT and TTS.
Implementing multi-Agent routing with Google A2A and ensuring conversation context remained consistent.
Managing emotion parsing and chat context across multiple message exchanges.

🏆 Accomplishments We're Proud Of

Delivered a fully functional AI system rebuilt from the ground up in just 36 hours.
Achieved smooth STT/TTS integration with local, high-performance LLMs.
Implemented multi-Agent communication for richer AI interactions.
Created a system bridging XR and desktop applications while keeping all processing local and offline.

📖 Lessons Learned

Starting fresh allowed us to rethink architecture for modularity and scalability.
Prioritizing core functionality first prevented bottlenecks in development.
Efficient handling of audio streams and WebSocket communication is crucial for real-time interaction.
Multi-Agent routing requires careful state and context management to avoid confusing responses.

🚀 What's Next for Project Replicant

Expand Acilia’s emotional intelligence and multi-turn contextual understanding.
Integrate real-time XR environmental awareness for immersive interaction.
Open-source the system to allow developers to plug in custom TTS, STT, and LLM modules.
Extend multi-Agent capabilities, enabling richer collaborative AI workflows.

🙏 Acknowledgments

A huge thank you to the team who made this possible:

Jason – for building the scraping tools that gather data for training and testing.
Makenna – for creating all the art assets that bring Acilia and the interface to life.
John – for implementing the Python server that powers real-time STT/TTS and multi-Agent routing.

***Sidenote: We plan on releasing this sooner than later! We would have sent it already had there not been a few critical keys in the code! We also didn't realize we needed to make a Github so it took a while for it to be pushed thanks to the size of a few things!