Inspiration
Security patrol is dangerous, repetitive, and expensive. Warehouses lose billions to theft annually. Military personnel risk their lives in reconnaissance. Farmers can't monitor vast fields 24/7.
We asked: What if you could just talk to a robot and have it patrol for you?
No complex interfaces. No joysticks. No training. Just speak naturally: "Go forward. Turn left. Start patrol."
VanguardAI makes advanced robotics accessible to anyone with a voice.
What it does
VanguardAI transforms the Unitree Go2 quadruped robot into a voice-controlled security system.
The flow is simple:
- ๐ค You speak โ "Go forward and patrol the area"
- ๐ง AI understands โ Converts speech to structured commands
- ๐ค Robot acts โ Executes movement in real-time
Supported commands:
- Movement: forward, backward, left, right
- Actions: patrol, stop, look around
- Natural variations work: "walk ahead", "move up", "go straight"
How we built it
Voice โ Smallest.ai STT โ Together AI LLM โ Cyberwave SDK โ Robot
Speech-to-Text (Smallest.ai Pulse)
- WebSocket streaming for 64ms latency
- Records 4-second audio chunks from microphone
- Returns transcribed text in real-time
Command Parsing (Together AI)
- Llama-3.3-70B-Instruct-Turbo model
- Structured prompt converts natural language to JSON
- Example: "go forward" โ
{"action": "move_forward", "value": 1.0}
Robot Control (Cyberwave SDK)
- Digital twin of Unitree Go2
- Motion bindings: Forward, Backward, Turn Left, Turn Right, Idle
- Commands execute on physical robot via edge connection
Stack:
- Python 3.12
- WebSockets for real-time audio streaming
- Async/await for non-blocking I/O
Challenges we ran into
SDK API Discovery - Cyberwave SDK methods weren't what we expected.
robot.move(x=1)didn't work. Had to dig through docs to findrobot.motion.asset.animation("Forward").Audio Latency - Initial implementation had 2+ second delay. Switched from REST API to WebSocket streaming to get real-time response.
LLM Output Parsing - Sometimes the LLM returned extra text with JSON. Added robust parsing with fallback to
{"action": "stop"}.Hackathon Time Pressure - 6.5 hours to go from zero to working demo. Prioritized core voiceโrobot loop over nice-to-haves like vision.
Accomplishments that we're proud of
โ End-to-end voice control working - Speak and the robot moves. No lag.
โ Clean modular architecture - Each component (voice, brain, robot) is independent and testable.
โ Natural language understanding - Say "walk forward", "go ahead", or "move up" - all work.
โ Built in one day - From empty folder to working demo in under 7 hours.
What we learned
- Smallest.ai Pulse is incredibly fast - 64ms TTFT makes voice control feel instant
- Cyberwave's digital twin approach abstracts away robot complexity
- LLMs as command parsers are powerful - no need for rigid grammar rules
- WebSockets >> REST for real-time applications
What's next for VanguardAI
๐ฎ Vision Module - Use Go2's RGB camera + VLM to detect threats (intruders, anomalies)
๐ฎ Alert System - Email/SMS notifications when threats detected
๐ฎ Autonomous Patrol - Waypoint-based navigation without voice commands
๐ฎ Multi-Robot Fleet - Coordinate multiple Go2 robots for large areas
๐ฎ Edge Inference - Run models on robot's onboard compute for offline operation
Built With
smallest-ai
together-ai
cyberwave
python
websockets
unitree-go2
**GitHub:** https://github.com/IneshReddy249/VanguardAI
Copy this whole thing into Devpost. For the "Built with" tags section, add these individually:
- `smallest-ai`
- `together-ai`
- `cyberwave`
- `python`
- `websockets`
- `unitree-go2`
Built With
- langchain
- python
Log in or sign up for Devpost to join the conversation.