Inspiration

Security patrol is dangerous, repetitive, and expensive. Warehouses lose billions to theft annually. Military personnel risk their lives in reconnaissance. Farmers can't monitor vast fields 24/7.

We asked: What if you could just talk to a robot and have it patrol for you?

No complex interfaces. No joysticks. No training. Just speak naturally: "Go forward. Turn left. Start patrol."

VanguardAI makes advanced robotics accessible to anyone with a voice.


What it does

VanguardAI transforms the Unitree Go2 quadruped robot into a voice-controlled security system.

The flow is simple:

  1. ๐ŸŽค You speak โ†’ "Go forward and patrol the area"
  2. ๐Ÿง  AI understands โ†’ Converts speech to structured commands
  3. ๐Ÿค– Robot acts โ†’ Executes movement in real-time

Supported commands:

  • Movement: forward, backward, left, right
  • Actions: patrol, stop, look around
  • Natural variations work: "walk ahead", "move up", "go straight"

How we built it

Voice โ†’ Smallest.ai STT โ†’ Together AI LLM โ†’ Cyberwave SDK โ†’ Robot

Speech-to-Text (Smallest.ai Pulse)

  • WebSocket streaming for 64ms latency
  • Records 4-second audio chunks from microphone
  • Returns transcribed text in real-time

Command Parsing (Together AI)

  • Llama-3.3-70B-Instruct-Turbo model
  • Structured prompt converts natural language to JSON
  • Example: "go forward" โ†’ {"action": "move_forward", "value": 1.0}

Robot Control (Cyberwave SDK)

  • Digital twin of Unitree Go2
  • Motion bindings: Forward, Backward, Turn Left, Turn Right, Idle
  • Commands execute on physical robot via edge connection

Stack:

  • Python 3.12
  • WebSockets for real-time audio streaming
  • Async/await for non-blocking I/O

Challenges we ran into

  1. SDK API Discovery - Cyberwave SDK methods weren't what we expected. robot.move(x=1) didn't work. Had to dig through docs to find robot.motion.asset.animation("Forward").

  2. Audio Latency - Initial implementation had 2+ second delay. Switched from REST API to WebSocket streaming to get real-time response.

  3. LLM Output Parsing - Sometimes the LLM returned extra text with JSON. Added robust parsing with fallback to {"action": "stop"}.

  4. Hackathon Time Pressure - 6.5 hours to go from zero to working demo. Prioritized core voiceโ†’robot loop over nice-to-haves like vision.


Accomplishments that we're proud of

โœ… End-to-end voice control working - Speak and the robot moves. No lag.

โœ… Clean modular architecture - Each component (voice, brain, robot) is independent and testable.

โœ… Natural language understanding - Say "walk forward", "go ahead", or "move up" - all work.

โœ… Built in one day - From empty folder to working demo in under 7 hours.


What we learned

  • Smallest.ai Pulse is incredibly fast - 64ms TTFT makes voice control feel instant
  • Cyberwave's digital twin approach abstracts away robot complexity
  • LLMs as command parsers are powerful - no need for rigid grammar rules
  • WebSockets >> REST for real-time applications

What's next for VanguardAI

๐Ÿ”ฎ Vision Module - Use Go2's RGB camera + VLM to detect threats (intruders, anomalies)

๐Ÿ”ฎ Alert System - Email/SMS notifications when threats detected

๐Ÿ”ฎ Autonomous Patrol - Waypoint-based navigation without voice commands

๐Ÿ”ฎ Multi-Robot Fleet - Coordinate multiple Go2 robots for large areas

๐Ÿ”ฎ Edge Inference - Run models on robot's onboard compute for offline operation


Built With

smallest-ai
together-ai
cyberwave
python
websockets
unitree-go2



**GitHub:** https://github.com/IneshReddy249/VanguardAI


Copy this whole thing into Devpost. For the "Built with" tags section, add these individually:
- `smallest-ai`
- `together-ai`  
- `cyberwave`
- `python`
- `websockets`
- `unitree-go2`

Built With

Share this project:

Updates