Agent_Forge

Inspiration

AI coding tools are great at generating code, but they often stop at suggestions. In real development workflows, code must be tested, executed, and debugged before it can be trusted. We were inspired by the idea of closing this gap — creating an AI system that doesn’t just generate code, but actually validates behavior through testing and execution.

AgentForge was built to explore what happens when AI agents reason about a codebase, generate tests, run them, analyze failures, and help developers fix issues faster.

What it does

AgentForge is an AI-powered testing and debugging workspace for backend codebases.

Given a repository, AgentForge: 1. Analyzes the repository structure and architecture 2. Generates a testing strategy 3. Automatically creates tests 4. Executes them using pytest 5. Analyzes failures 6. Produces a structured repair plan and IDE-ready prompts

Instead of stopping at code generation, AgentForge closes the loop between reasoning and execution.

The system uses Amazon Nova 2 Lite to power multiple reasoning agents that perform repository analysis, test generation, failure diagnosis, and repair planning.

How we built it

AgentForge is built as a multi-engine AI system with several components working together.

The backend is implemented in Python with FastAPI and orchestrates the full agent pipeline.

Key components include: • Repository Parsing Engine – scans the repo and extracts routes, services, models, and dependencies • Repository Knowledge Graph – builds relationships between files, functions, and APIs • Retrieval Engine – creates structured context bundles for AI agents • Agent Reasoning Engine – powered by Amazon Nova 2 Lite for test generation, failure analysis, and repair planning • Test Execution Engine – creates an isolated workspace and runs tests with pytest • Memory Engine – stores artifacts, execution results, and iteration history • Orchestration Engine – manages pipeline stages and agent execution

The frontend is built with Next.js, providing a developer workspace where users can inspect artifacts, execution logs, generated tests, and repair plans.

Challenges we ran into

One of the biggest challenges was bridging AI reasoning with real code execution.

AI-generated tests can fail for many reasons: • incorrect assumptions about application state • missing fixtures or setup • differences between expected and actual API behavior

We had to design a system that could capture execution results, parse failures, and feed that information back into the reasoning pipeline.

Another challenge was managing the complexity of the pipeline itself. Coordinating repository parsing, retrieval, agent reasoning, and execution required building a structured orchestration system that could reliably run each stage.

Accomplishments that we're proud of

We’re proud that AgentForge successfully demonstrates a full execution-aware AI workflow.

Instead of just generating suggestions, the system: • analyzes a real codebase • generates tests • executes them in a real environment • diagnoses failures • produces actionable repair guidance

We also built a developer workspace that makes these artifacts transparent, allowing users to inspect every stage of the pipeline.

What we learned

Building AgentForge taught us that creating effective AI developer tools requires much more than simply calling a model API. One of the most important lessons was how critical prompt design and structured context are when building multi-agent systems.

Early on, we discovered that sending large unstructured prompts to the model produced inconsistent outputs. To address this, we designed a structured context pipeline that organizes repository information, test results, and previous reasoning into well-defined artifacts that can be injected into each model request.

We also learned how important structured memory between agent calls is. Each stage of the pipeline — repository analysis, test generation, execution, and failure diagnosis — produces artifacts that are stored and reused by later agents. This ensures that every Amazon Nova 2 Lite API request receives relevant, grounded context instead of raw code dumps.

Another key lesson was the importance of deterministic orchestration around AI systems. By separating reasoning tasks into specialized agents and giving them structured inputs, we were able to make the pipeline significantly more reliable.

Overall, the project reinforced that successful AI systems rely on a combination of good prompt engineering, structured data flow, and carefully designed agent orchestration, not just powerful models.

What's next for Agent_Forge

Future improvements include: • supporting additional languages beyond Python • deeper repository understanding using semantic embeddings • automatic code repair and patch generation • integration with GitHub repositories and CI pipelines • iterative learning from previous debugging runs

Our goal is to evolve AgentForge into a system that can act as an AI-powered testing and debugging partner for developers, helping them move from code generation to reliable, production-ready software.

Built With

amazon-nova-2-lite
aws-bedrock-api
fastapi
github
next.js
node.js
pytest
python
react
sqlite
tailwind-css
typescript

Updates

Varun Kadapatti started this project — Mar 16, 2026 08:01 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.