Inspiration

Frustrated by writing hundreds of lines of boilerplate code just to chain a few AI APIs together. Every project needed the same pattern: search something, summarize with AI, generate an image, send an email. Why couldn't this be as simple as Unix pipes? That's when AgentScript was born - a DSL where search "topic" -> summarize -> email "[email protected]" just works.


What it does

AgentScript is a domain-specific language that chains Gemini AI with Google APIs in simple, readable pipelines. With 34 commands, you can:

  • Research topics and generate reports
  • Create images (Imagen 4) and videos (Veo 3.1)
  • Convert text to speech with natural voices
  • Send emails, create calendar events, schedule Google Meets
  • Build Google Docs, Sheets, and Forms automatically
  • Upload to YouTube including Shorts
  • Plan trips with places search and Google Maps integration
  • Translate content to any language
  • Run tasks in parallel for maximum speed

One line replaces hundreds of lines of code.


How we built it

  • Go 1.22 for the runtime engine
  • Participle v2 for parsing the DSL grammar
  • Gemini 2.5 Flash for text, summarization, and translation
  • Imagen 4 for image generation
  • Veo 3.1 for video generation
  • Gemini TTS for text-to-speech
  • Google Workspace APIs (Gmail, Calendar, Drive, Docs, Sheets, Forms, YouTube, Tasks, People)
  • ffmpeg for audio/video processing
  • OAuth 2.0 for secure Google API authentication

The architecture separates parsing (grammar.go), execution (runtime.go), and API clients (client.go, google.go) for clean modularity.


Challenges we ran into

  • Veo API quotas - Limited to 10 videos/day, had to implement image_audio_merge as a fallback using ffmpeg
  • Google Forms API quirks - Index 0 gets ignored in protobuf, required ForceSendFields workaround
  • Gemini TTS format - Returns raw PCM, not WAV. Had to use ffmpeg to convert the audio
  • OAuth scope management - Adding new Google APIs required re-authentication with expanded scopes
  • Parallel execution - Coordinating concurrent API calls while maintaining pipeline flow

Accomplishments that we are proud of

  • 34 working commands spanning research, multimedia, and Google Workspace
  • Natural language mode - Describe what you want in plain English, AgentScript translates to DSL
  • Parallel execution - parallel { } runs multiple branches concurrently
  • Fuzzy date parsing - Calendar understands "tomorrow around 2ish" or "after lunch"
  • End-to-end video pipeline - Search to Summarize to TTS to Image to Video to YouTube upload in one script
  • Zero-config email - Just pipe to email "address" and it sends beautifully formatted HTML

What we learned

  • Gemini's multimodal capabilities are incredibly powerful when chained together
  • DSLs dramatically lower the barrier to AI automation
  • Google's API ecosystem is vast but OAuth complexity is real
  • Sometimes ffmpeg is the answer (audio conversion, video merging)
  • Good error messages matter more than perfect code

What's next for AgentScript

  • Google Slides - Auto-generate presentations from content
  • Real-time data - Reddit API, X/Twitter API for live social feeds
  • Amadeus integration - Flight and hotel search
  • Multi-agent mode - Multiple AgentScript instances collaborating
  • Web UI - Visual pipeline builder with drag-and-drop
  • MCP server - Run AgentScript as a tool from Claude or other AI assistants
  • Community commands - Plugin system for user-contributed integrations

Built With

Share this project:

Updates