Integrated automated code execution and agent observability
Landing page

Inspiration

Your work tools are constantly generating signals. Emails arrive with deployment failures. Slack threads discuss bugs. Linear tickets pile up. But turning those signals into action still requires you to manually read, context switch, decide, and coordinate across apps.

We wanted AI agents that work for you in the background. They read your Gmail, watch your Slack, and understand what needs to happen without you typing a single prompt. When something needs attention, they just handle it.

What it does

Sloth is a productivity platform where agents continuously monitor your tools and act on your behalf. No prompts, no workflow builders. The agents infer context from Gmail, Slack, Linear, and other sources, then autonomously investigate, decide, and take action across your entire stack.

For our demo, we show deployment failure response. When a failure notification hits Gmail, Sloth automatically investigates Vercel logs and Slack context, traces the breaking commit, creates a Linear ticket with full details, writes code fixes in a sandbox, commits to GitHub, and notifies the team. You didn't ask it to. It just saw the signal and handled it.

The same architecture applies to any situation where your tools contain the context needed to act.

How is it different from workflow builders

Our theory is that with the powers of AI agents, we should not be prompting anything. Traditional workflow builders require you to define triggers, map out steps, and specify exactly what happens when. You're still doing the thinking, just in a different interface. Sloth flips this. The agents observe your tools, understand context, and decide what to do. You don't build workflows. You just connect your accounts and let the agents figure out when and how to act based on the signals they see. Beyond autonomous actions, Sloth also includes a coding agent that users can prompt directly for code generation, refactoring, or debugging tasks. This gives you flexibility when you want to guide the agent on a specific problem while still benefiting from the sandboxed execution environment.

How we built it

We used Mastra to orchestrate nine specialized agents: Gmail, Slack, Linear, GitHub, Vercel, Code, Web Browsing, Memory, and Product Management. The orchestrator runs using Claude Opus 4.5 with extended thinking for complex reasoning.

E2B sandboxes enable safe code execution. Running code in a remote sandbox instead of locally means agents can install dependencies, run builds, and execute commands without risking your machine or exposing credentials. The sandbox is isolated, reproducible, and disposable.

The Next.js frontend streams every agent action in real time through an IDE style dashboard.

Composio handles tool integrations so we didn't need custom OAuth for each service.

Arize provides observability into our agent traces, helping us debug and monitor agent performance across runs.

Challenges we ran into

Agent orchestration needed careful design to avoid race conditions and duplicate actions. We had to balance sequential handoffs with parallel investigation.

Tracking sandbox file modifications for GitHub commits required a custom registry. Streaming extended thinking alongside tool calls took iteration to get right.

Accomplishments that we're proud of

The system works end to end. We can trigger an incident, watch Sloth investigate across tools, see real Linear tickets created, watch code fixes written, and see commits land on GitHub automatically. The observability makes the AI trustworthy. You can expand any step to see agent reasoning and tool arguments.

What we learned

Multi agent systems need clear boundaries. Focused agents with an orchestrator work better than one massive agent. Extended thinking is worth it for complex reasoning. Real time observability is essential for trusting AI systems.

What's next for Sloth

Making it more personalized with custom memory spaces so Sloth remembers your style. Sloth can be extended for not only debugging production issues, but help manage the coordination between slack, gmail and any other messaging service a dev might use daily.

Built With

arize
browserbase
composio
e2b
mastra.ai
next.js
typescript

Updates

Dhruv Bansal started this project — Jan 17, 2026 03:16 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.