Inspiration
Frustrated by writing hundreds of lines of boilerplate code just to chain a few AI APIs together. Every project needed the same pattern: search something, summarize with AI, generate an image, send an email. Why couldn't this be as simple as Unix pipes? That's when AgentScript was born - a DSL where search "topic" -> summarize -> email "[email protected]" just works.
What it does
AgentScript is a domain-specific language that chains Gemini AI with Google APIs in simple, readable pipelines. With 34 commands, you can:
- Research topics and generate reports
- Create images (Imagen 4) and videos (Veo 3.1)
- Convert text to speech with natural voices
- Send emails, create calendar events, schedule Google Meets
- Build Google Docs, Sheets, and Forms automatically
- Upload to YouTube including Shorts
- Plan trips with places search and Google Maps integration
- Translate content to any language
- Run tasks in parallel for maximum speed
One line replaces hundreds of lines of code.
How we built it
- Go 1.22 for the runtime engine
- Participle v2 for parsing the DSL grammar
- Gemini 2.5 Flash for text, summarization, and translation
- Imagen 4 for image generation
- Veo 3.1 for video generation
- Gemini TTS for text-to-speech
- Google Workspace APIs (Gmail, Calendar, Drive, Docs, Sheets, Forms, YouTube, Tasks, People)
- ffmpeg for audio/video processing
- OAuth 2.0 for secure Google API authentication
The architecture separates parsing (grammar.go), execution (runtime.go), and API clients (client.go, google.go) for clean modularity.
Challenges we ran into
- Veo API quotas - Limited to 10 videos/day, had to implement image_audio_merge as a fallback using ffmpeg
- Google Forms API quirks - Index 0 gets ignored in protobuf, required ForceSendFields workaround
- Gemini TTS format - Returns raw PCM, not WAV. Had to use ffmpeg to convert the audio
- OAuth scope management - Adding new Google APIs required re-authentication with expanded scopes
- Parallel execution - Coordinating concurrent API calls while maintaining pipeline flow
Accomplishments that we are proud of
- 34 working commands spanning research, multimedia, and Google Workspace
- Natural language mode - Describe what you want in plain English, AgentScript translates to DSL
- Parallel execution - parallel { } runs multiple branches concurrently
- Fuzzy date parsing - Calendar understands "tomorrow around 2ish" or "after lunch"
- End-to-end video pipeline - Search to Summarize to TTS to Image to Video to YouTube upload in one script
- Zero-config email - Just pipe to email "address" and it sends beautifully formatted HTML
What we learned
- Gemini's multimodal capabilities are incredibly powerful when chained together
- DSLs dramatically lower the barrier to AI automation
- Google's API ecosystem is vast but OAuth complexity is real
- Sometimes ffmpeg is the answer (audio conversion, video merging)
- Good error messages matter more than perfect code
What's next for AgentScript
- Google Slides - Auto-generate presentations from content
- Real-time data - Reddit API, X/Twitter API for live social feeds
- Amadeus integration - Flight and hotel search
- Multi-agent mode - Multiple AgentScript instances collaborating
- Web UI - Visual pipeline builder with drag-and-drop
- MCP server - Run AgentScript as a tool from Claude or other AI assistants
- Community commands - Plugin system for user-contributed integrations
Log in or sign up for Devpost to join the conversation.