Landing Page
Robust application made for production
Editor Studio
Coding environment
Code Execution
Model switching
Agent Graph created within the system
Refined Output from backend Google-ADK agents
LLM recorded stats
Observation panel showing results of refinements
Profile page

AgentFlux: Hackathon Writeup

Inspiration

Tireless Testing and Agent Stabilization:
Building robust AI agents often involves laborious cycles of testing and tweaking. Developers and teams spend countless hours running and stabilizing a single agent, which leads to significant underutilization of the agent’s true capabilities. This bottleneck inspired us to design a platform that transforms this repetitive process into a streamlined, automated, and intelligent workflow.
Custom Pipelines for Real Needs:
Many amateur and even experienced clients end up assembling agent pipelines that are generic and not tailored to their specific job requirements. These cookie-cutter setups result in subpar performance and wasted resources. AgentFlux aims to bridge this gap by enabling both novices and experts to design, visualize, and refine agent graphs purpose-built for their exact tasks.
Rapidly Evolving Field & Ground Truth Deficit:
The field of AI agents is evolving at breakneck speed, but most large language models (LLMs) lack sufficient ground truth or up-to-date information to generate reliable, production-ready code for agents. Our inspiration was to create a system that not only keeps pace with the latest advancements but also empowers LLMs and users with real-world, validated workflows.
Additional Motivation:
- The need for dynamic, modular, and adaptable agent-based solutions.
- Frustration with the lack of robust monitoring and debugging tools in agent orchestration.
- The vision of democratizing AI agent engineering, making it accessible and production-ready for everyone.

What it does

AgentFlux is an advanced agent engineering platform that empowers users to build, test, visualize, and refine sophisticated AI agent systems with unprecedented control and insight. Here’s what sets it apart:

Multi-Modal Agent Refinement
AgentFlux offers three core refinement options, each designed to maximize the value and reliability of backend agents:

Refine Prompts:
Uses a fine-tuned model to improve user-given prompts, automatically setting guardrails and system instructions for safer, higher-quality agent behavior.
Re-architect Graphs:
Analyzes agent execution graphs to identify "God Tasks" (complex, monolithic nodes), decomposes them into smaller, manageable modules, and dynamically shifts to optimal models (using Hugging Face and Google endpoints) for specific tasks. This results in better modularity, maintainability, and performance.
Refinement Loop:
Automates the cycle of refinement → execution → evaluation, iterating until user-defined criteria are met. This loop enables continuous improvement and validation of agent pipelines.

Backend Agent Focus:
The core innovation lies in backend agent orchestration. Dedicated backend agents (e.g., Manager Agent, Graph Redesigner, Prompt Refiner) work collaboratively, leveraging advanced models like Gemini, DeepSeek, and Mistral. Each agent specializes in a particular task (e.g., context transfer, code refactoring, or prompt optimization) and the backend dynamically selects the ideal agent for each step.
Visualization and Monitoring:
- Visualize agent graphs and their interconnections.
- Real-time performance metrics and diff reports for every refinement.
- Integrated logging and profiling to track every run and facilitate fine-grained state reversion.
No-Code Model Integration:
Associate custom models from Hugging Face in a no-code environment, making advanced agent engineering accessible to all.
Collaborative Playground:
Multi-user, session-aware playgrounds for collaborative agent development, with live code editing and execution.

How we built it

Overall Architecture:
AgentFlux’s architecture is built for modularity, scalability, and extensibility, combining modern web technologies with cutting-edge AI infrastructures.
- Frontend:
- Next.js-based UI enabling real-time code editing, graph visualization, and live feedback.
- Interactive playgrounds for multi-user collaboration.
- Backend:
- FastAPI orchestrates agent workflows and handles business logic.
- Node.js server manages session identification, queuing, and socket-based real-time communication.
- All agent code executes in isolated Amazon EC2 containers for security and scalability.
- Agent Workflows:
- The backend dynamically routes requests to specialized agent pipelines (Prompt Refinement or Agent-Graph Redesign).
- Each workflow is composed of modular agents, including Managers, Refiners, Validators, and Output Parsers.
Emphasis on Google-ADK:
The backbone of our agent execution is the Google ADK (Agent Development Kit), which provides:
- Seamless session management via InMemorySessionService.
- Artifact handling and state preservation through InMemoryArtifactService.
- Execution runners that orchestrate agent pipelines, ensuring modularity and repeatability.
- Integration with Gemini and Vertex-AI models for reliable, production-grade LLM support.

Code Example:

  session_service = InMemorySessionService()
  runner = Runner(app_name="grapharchitect", agent=graph_architect, artifact_service=InMemoryArtifactService(), session_service=session_service)
  # Run agent pipeline
  events = runner.run(user_id="evangelist", session_id="session101", new_message=content)

Model Strategy:
- Dynamic selection of models (DeepSeek, Mistral, Gemini, etc.) according to task requirements.
- All interactions with LLMs are abstracted and versioned, supporting both public and private endpoints.
Monitoring, Observability & Security:
- All runs, refinements, and merges are logged.
- Integrated profilers visualize bottlenecks and agent performance.
- Secure communication between frontend, server, and compute instances.

Challenges we ran into

Complexity of Agent Orchestration:
Designing modular, multi-agent systems that can handle diverse tasks (prompt refinement, graph restructuring, validation) without introducing bottlenecks or race conditions was a significant challenge.
Stabilizing Agent Behavior:
Agents can exhibit unpredictable behavior when handling edge cases or ambiguous inputs. Ensuring robustness required fine-tuning, guardrails, and iterative validation.
Session Management & Concurrency:
Supporting concurrent, multi-user sessions with isolated execution contexts and real-time interaction introduced non-trivial engineering hurdles.
Model Selection & Efficiency:
Choosing the right model for the right sub-task, and doing so dynamically, was a challenge. Our solution leverages Google-ADK's capabilities and built-in model orchestration to assign the most appropriate LLM per node/task.
Ground Truth & Evaluation:
Lack of ground truth for rapidly evolving agent patterns made automated evaluation tricky. We implemented diff reports and validation agents to close this gap.
Scaling and Production-Readiness:
Ensuring the platform is production-ready required robust monitoring, secure communications, and efficient resource management.
Google-ADK as a Solution:
Integrating Google-ADK was transformative. It provided a scalable and reliable foundation for agent execution, streamlined session handling, and offered direct hooks into Gemini and Vertex-AI for best-in-class LLM performance.

Accomplishments that we're proud of

AI Restructuring for Millions:
AgentFlux empowers businesses of all sizes to restructure their AI agent systems, maximizing productivity and accelerating their journey to the forefront of the AI revolution.
Automated, Intelligent Agent Engineering:
Our platform automates what used to be labor-intensive engineering tasks, making advanced agent workflows accessible and reliable.
Production-Ready, Scalable System:
We achieved true scalability and production readiness by leveraging cloud-native technologies and robust backend orchestration.
No-Code Advanced Model Integration:
We made it possible for users to integrate and manage state-of-the-art LLMs from Hugging Face and Google with zero code.
Comprehensive Observability:
Fine-grained logging, diff reports, and profilers allow users to monitor, debug, and optimize agents like never before.
Iterative Feedback and Refinement:
The platform’s feedback loop and refinement cycle enable users to iteratively improve their agent pipelines based on real-world results.

What we learned

Scalability and Modularity:
Building a modular system from the ground up enabled us to scale horizontally, support multiple concurrent users, and deploy new features rapidly.
Resilience through Gemini & Google Tech:
By leveraging Gemini and Vertex-AI, we achieved high availability and production-grade reliability, making AgentFlux suitable for enterprise deployment.
Effective Use of Google-ADK:
Google-ADK proved invaluable in building and orchestrating complex agent pipelines with robust session and artifact management.
Advanced Monitoring:
We learned how to monitor LLM calls directly within the application, providing users with actionable statistics and enabling them to plan their model usage and spending.
Continuous Improvement:
The journey taught us the value of continuous feedback and iterative refinement—both from users and from the system itself.
User-Centric Design:
Building tools that are intuitive and accessible—while still retaining powerful customization—was key to adoption and user satisfaction.

What's next for AgentFlux

Framework Agnosticism and Enhanced Code Generation:
We plan to support more underlying frameworks and agent paradigms, making the system even more flexible and efficient in generating agent code.
Expanded Model and Tool Ecosystem:
New options will allow users to bring their own models, integrate more custom tools, and onboard MCP (Model Coordination Protocol) servers—unlocking the full power of Google-ADK.
Advanced Analytics and Reporting:
Deeper analytics, including agent performance dashboards, optimization recommendations, and cost tracking.
Marketplace and Community Features:
Enable sharing of agent templates, refinement recipes, and model presets through a community-driven marketplace.
More Robust State Management:
Further improvements in state reversion, checkpointing, and rollback for safer experimentation.
Enterprise Integrations:
Add support for more enterprise services, security standards, and orchestration layers.
Ongoing Research Integration:
As the field evolves, we’re committed to integrating the latest research in agent orchestration, prompt engineering, and LLM evaluation.

AgentFlux is just getting started. With a robust foundation and a community-driven vision, we’re excited to help shape the future of intelligent agent engineering.