VanguardAI — Voice-Controlled Security Patrol Robot

Built at Robotic Agents Hackathon | March 13, 2026 | Frontier Tower, SF

VanguardAI transforms the Unitree Go2 quadruped into an autonomous security patrol system controlled by voice commands. Speak naturally — the robot responds in real-time.

End-to-end pipeline: voice → STT → LLM → structured command → robot movement. Full loop under 500ms.

What It Does

"Go forward"  → Robot walks forward
"Turn left"   → Robot turns 90° left
"Patrol"      → Robot starts patrol pattern
"Stop"        → Robot halts immediately

The LLM handles natural language variation — "walk ahead", "move forward", "go straight" all resolve to the same action. Structured JSON output keeps robot command execution deterministic.

System Architecture

Voice Input
    ↓
Smallest.ai Pulse STT (WebSocket stream, 64ms latency)
    ↓
Together AI — Llama-3.3-70B
    ↓ {"action": "move_forward", "value": 1.0}
Cyberwave SDK
    ↓
Unitree Go2 (4D LIDAR + RGB camera)

Latency Breakdown

Stage	Latency
STT (Smallest.ai Pulse)	~64ms
LLM inference (Together AI)	~200–400ms
Robot command execution	~50ms
End-to-end voice → movement	~350–550ms

LLM latency is the dominant term. This is a real-time constraint problem — at 500ms end-to-end, the system is usable for security patrol. Sub-200ms would require either a smaller model or self-hosted inference with speculative decoding. The Together AI serving stack handles the inference infrastructure; understanding what's underneath it (continuous batching, KV cache management, tensor parallelism) informs how to optimize this pipeline further.

Tech Stack

Component	Technology	Detail
Speech-to-Text	Smallest.ai Pulse	WebSocket stream, 64ms
LLM	Together AI — Llama-3.3-70B	Natural language → JSON
Robot Control	Cyberwave SDK	Digital twin + locomotion
Hardware	Unitree Go2	Quadruped, 4D LIDAR, RGB camera

Project Structure

VanguardAI/
├── src/
│   ├── voice.py      # Smallest.ai STT — audio → text
│   ├── brain.py      # Together AI LLM — text → action JSON
│   ├── robot.py      # Cyberwave SDK — action → robot movement
│   └── vision.py     # Together AI VLM — camera → threat detection
├── main.py           # Orchestration loop
├── requirements.txt
└── .env.example

Quick Start

1. Clone & Setup

git clone https://github.com/IneshReddy249/VanguardAI.git
cd VanguardAI
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Configure API Keys

cp .env.example .env
# SMALLEST_API_KEY=your_key
# TOGETHER_API_KEY=your_key
# CYBERWAVE_API_KEY=your_key

3. Run

python main.py
# Speak when you see: Listening...

Voice Commands

Command	Action
"go forward" / "move ahead" / "walk straight"	Forward
"go back" / "move backward"	Backward
"turn left"	Rotate left 90°
"turn right"	Rotate right 90°
"stop" / "halt"	Stop
"patrol"	Start patrol sequence
"quit" / "exit"	End program

Module Details

voice.py — Records 4s audio, streams to Smallest.ai Pulse via WebSocket, returns transcribed text.

brain.py — Sends transcript to Llama-3.3-70B with structured output prompt. Returns deterministic JSON:

{"action": "move_forward", "value": 1.0}

robot.py — Maps action JSON to Cyberwave SDK motion bindings. Triggers robot locomotion.

vision.py — Streams camera feed to Together AI VLM for real-time threat detection and scene understanding.

main.py — Infinite loop: listen → parse → execute. Graceful shutdown on quit or Ctrl+C.

Why This Is an Inference Problem

The 200–400ms LLM step is the system bottleneck. Optimizing it requires:

Smaller model — a 7B instruction-tuned model likely handles structured command parsing with <50ms latency, acceptable quality
Speculative decoding — draft model generates the short JSON output, target verifies; output is typically <20 tokens, ideal for spec decoding
Self-hosted inference — removing Together AI API overhead and running TensorRT-LLM locally on edge hardware would cut latency by ~150ms

This is the next engineering step for a production deployment.

References

Related Projects

The inference stack that powers the LLM layer in this system:

Author

Inesh Reddy Chappidi — LLM Inference & Systems Engineer

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Physical_AI		Physical_AI
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VanguardAI — Voice-Controlled Security Patrol Robot

What It Does

System Architecture

Latency Breakdown

Tech Stack

Project Structure

Quick Start

1. Clone & Setup

2. Configure API Keys

3. Run

Voice Commands

Module Details

Why This Is an Inference Problem

References

Related Projects

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VanguardAI — Voice-Controlled Security Patrol Robot

What It Does

System Architecture

Latency Breakdown

Tech Stack

Project Structure

Quick Start

1. Clone & Setup

2. Configure API Keys

3. Run

Voice Commands

Module Details

Why This Is an Inference Problem

References

Related Projects

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages