Skip to content

IneshReddy249/VanguardAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

VanguardAI — Voice-Controlled Security Patrol Robot

Built at Robotic Agents Hackathon | March 13, 2026 | Frontier Tower, SF

VanguardAI transforms the Unitree Go2 quadruped into an autonomous security patrol system controlled by voice commands. Speak naturally — the robot responds in real-time.

End-to-end pipeline: voice → STT → LLM → structured command → robot movement. Full loop under 500ms.

Python Smallest.ai Together AI Cyberwave


What It Does

"Go forward"  → Robot walks forward
"Turn left"   → Robot turns 90° left
"Patrol"      → Robot starts patrol pattern
"Stop"        → Robot halts immediately

The LLM handles natural language variation — "walk ahead", "move forward", "go straight" all resolve to the same action. Structured JSON output keeps robot command execution deterministic.


System Architecture

Voice Input
    ↓
Smallest.ai Pulse STT (WebSocket stream, 64ms latency)
    ↓
Together AI — Llama-3.3-70B
    ↓ {"action": "move_forward", "value": 1.0}
Cyberwave SDK
    ↓
Unitree Go2 (4D LIDAR + RGB camera)

Latency Breakdown

Stage Latency
STT (Smallest.ai Pulse) ~64ms
LLM inference (Together AI) ~200–400ms
Robot command execution ~50ms
End-to-end voice → movement ~350–550ms

LLM latency is the dominant term. This is a real-time constraint problem — at 500ms end-to-end, the system is usable for security patrol. Sub-200ms would require either a smaller model or self-hosted inference with speculative decoding. The Together AI serving stack handles the inference infrastructure; understanding what's underneath it (continuous batching, KV cache management, tensor parallelism) informs how to optimize this pipeline further.


Tech Stack

Component Technology Detail
Speech-to-Text Smallest.ai Pulse WebSocket stream, 64ms
LLM Together AI — Llama-3.3-70B Natural language → JSON
Robot Control Cyberwave SDK Digital twin + locomotion
Hardware Unitree Go2 Quadruped, 4D LIDAR, RGB camera

Project Structure

VanguardAI/
├── src/
│   ├── voice.py      # Smallest.ai STT — audio → text
│   ├── brain.py      # Together AI LLM — text → action JSON
│   ├── robot.py      # Cyberwave SDK — action → robot movement
│   └── vision.py     # Together AI VLM — camera → threat detection
├── main.py           # Orchestration loop
├── requirements.txt
└── .env.example

Quick Start

1. Clone & Setup

git clone https://github.com/IneshReddy249/VanguardAI.git
cd VanguardAI
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Configure API Keys

cp .env.example .env
# SMALLEST_API_KEY=your_key
# TOGETHER_API_KEY=your_key
# CYBERWAVE_API_KEY=your_key

3. Run

python main.py
# Speak when you see: Listening...

Voice Commands

Command Action
"go forward" / "move ahead" / "walk straight" Forward
"go back" / "move backward" Backward
"turn left" Rotate left 90°
"turn right" Rotate right 90°
"stop" / "halt" Stop
"patrol" Start patrol sequence
"quit" / "exit" End program

Module Details

voice.py — Records 4s audio, streams to Smallest.ai Pulse via WebSocket, returns transcribed text.

brain.py — Sends transcript to Llama-3.3-70B with structured output prompt. Returns deterministic JSON:

{"action": "move_forward", "value": 1.0}

robot.py — Maps action JSON to Cyberwave SDK motion bindings. Triggers robot locomotion.

vision.py — Streams camera feed to Together AI VLM for real-time threat detection and scene understanding.

main.py — Infinite loop: listen → parse → execute. Graceful shutdown on quit or Ctrl+C.


Why This Is an Inference Problem

The 200–400ms LLM step is the system bottleneck. Optimizing it requires:

  • Smaller model — a 7B instruction-tuned model likely handles structured command parsing with <50ms latency, acceptable quality
  • Speculative decoding — draft model generates the short JSON output, target verifies; output is typically <20 tokens, ideal for spec decoding
  • Self-hosted inference — removing Together AI API overhead and running TensorRT-LLM locally on edge hardware would cut latency by ~150ms

This is the next engineering step for a production deployment.


References


Related Projects

The inference stack that powers the LLM layer in this system:


Author

Inesh Reddy Chappidi — LLM Inference & Systems Engineer

LinkedIn GitHub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages