Agent script

Inspiration

Frustrated by writing hundreds of lines of boilerplate code just to chain a few AI APIs together. Every project needed the same pattern: search something, summarize with AI, generate an image, send an email. Why couldn't this be as simple as Unix pipes? That's when AgentScript was born - a DSL where search "topic" -> summarize -> email "[email protected]" just works.

What it does

AgentScript is a domain-specific language that chains Gemini AI with Google APIs in simple, readable pipelines. With 34 commands, you can:

Research topics and generate reports
Create images (Imagen 4) and videos (Veo 3.1)
Convert text to speech with natural voices
Send emails, create calendar events, schedule Google Meets
Build Google Docs, Sheets, and Forms automatically
Upload to YouTube including Shorts
Plan trips with places search and Google Maps integration
Translate content to any language
Run tasks in parallel for maximum speed

One line replaces hundreds of lines of code.

How we built it

Go 1.22 for the runtime engine
Participle v2 for parsing the DSL grammar
Gemini 2.5 Flash for text, summarization, and translation
Imagen 4 for image generation
Veo 3.1 for video generation
Gemini TTS for text-to-speech
Google Workspace APIs (Gmail, Calendar, Drive, Docs, Sheets, Forms, YouTube, Tasks, People)
ffmpeg for audio/video processing
OAuth 2.0 for secure Google API authentication

The architecture separates parsing (grammar.go), execution (runtime.go), and API clients (client.go, google.go) for clean modularity.

Challenges we ran into

Veo API quotas - Limited to 10 videos/day, had to implement image_audio_merge as a fallback using ffmpeg
Google Forms API quirks - Index 0 gets ignored in protobuf, required ForceSendFields workaround
Gemini TTS format - Returns raw PCM, not WAV. Had to use ffmpeg to convert the audio
OAuth scope management - Adding new Google APIs required re-authentication with expanded scopes
Parallel execution - Coordinating concurrent API calls while maintaining pipeline flow

Accomplishments that we are proud of

34 working commands spanning research, multimedia, and Google Workspace
Natural language mode - Describe what you want in plain English, AgentScript translates to DSL
Parallel execution - parallel { } runs multiple branches concurrently
Fuzzy date parsing - Calendar understands "tomorrow around 2ish" or "after lunch"
End-to-end video pipeline - Search to Summarize to TTS to Image to Video to YouTube upload in one script
Zero-config email - Just pipe to email "address" and it sends beautifully formatted HTML

What we learned

Gemini's multimodal capabilities are incredibly powerful when chained together
DSLs dramatically lower the barrier to AI automation
Google's API ecosystem is vast but OAuth complexity is real
Sometimes ffmpeg is the answer (audio conversion, video merging)
Good error messages matter more than perfect code

What's next for AgentScript

Google Slides - Auto-generate presentations from content
Real-time data - Reddit API, X/Twitter API for live social feeds
Amadeus integration - Flight and hotel search
Multi-agent mode - Multiple AgentScript instances collaborating
Web UI - Visual pipeline builder with drag-and-drop
MCP server - Run AgentScript as a tool from Claude or other AI assistants
Community commands - Plugin system for user-contributed integrations

Built With

Updates

Vinod Halaharvi started this project — Feb 09, 2026 06:29 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.