Forger

Inspiration

The inspiration for this project came from a simple question: what if AI characters could build relationships over time instead of just replying one prompt at a time? Most chat systems feel stateless, but real social dynamics depend on memory, context, personality, and timing. We wanted to build something that feels closer to a living narrative system, where characters can react, hesitate, challenge each other, and evolve day by day.

We wanted also inspired by strategy games, roleplay storytelling, and social simulations where interactions create emergent outcomes. Rather than scripting every moment manually, we wanted a framework where story beats emerge from character identity, world rules, and interaction structure.

What it does

This app is a multi-agent simulation framework that runs AI character interactions across a configurable world.

It loads a world config (agents, rules, duration, relationships). It creates agents with distinct identity scripts and memory. It simulates day-by-day interactions using a turn-based orchestration system. It supports both one-on-one and group conversations. It uses two-phase responses: agents generate a physical reaction first, then decide whether to speak. It tracks world state, conversation history, and social graph evolution over time. It exports results as readable Markdown narratives and JSON logs for analysis.

In short, it turns static prompting into a structured social simulation engine that can generate dynamic, character-driven stories.

How we built it

We built this app in layers, starting from a simple idea: simulate believable social interactions between AI characters over multiple days. Instead of hardcoding one story, we made it config-driven, so each world (like romance, market, or war scenarios) can be defined in JSON with agents, rules, and introductions.

After that, we created the orchestration layer. The orchestrator became the “brain” of the simulation: it initializes the world state, creates agents, runs day-by-day loops, schedules conversations, and stores interaction logs. This gave the project a clear structure and made debugging easier because everything flows through one controller.

Then we focused on the agents themselves. Each agent has an identity script, known relationships, context history, and memory. We started with short-term memory, then added optional hybrid memory (short-term + episodic) so agents could remember important past interactions in a more meaningful way.

The biggest feature I added was two-phase response generation. Instead of forcing every agent to always speak, the agent first generates a physical reaction, then decides if speaking is necessary, and only then generates dialogue if needed. This made conversations feel less robotic and more natural, especially in group scenes where silence can be just as important as speech.

Finally, we built output and usability around it: CLI support, Markdown/JSON exports, and tests for config loading, orchestration, and response behavior. So the final app is not just a demo—it’s a reusable simulation framework.

Challenges we ran into

The hardest challenge was realism vs control. If we gave agents too much freedom, responses became messy and inconsistent. If we constrained them too much, dialogue sounded repetitive. We had to keep refining prompt design and response parsing to keep the balance.

Another challenge was multi-agent turn handling. Group conversations can easily become chaotic, so we had to implement strict round-robin logic, context windows, and early-scene ending rules when a decision was clearly made.

Memory was also tricky. Long context helps continuity, but too much context hurts quality and cost. Adding episodic memory improved recall, but it introduced extra complexity in storage, retrieval, and deciding what actually counts as “important.”

We also ran into structured output problems from LLM responses. Even with instructions, models sometimes return unexpected formatting. We had to add parsers, fallbacks, and compression logic to keep outputs stable for downstream processing.

Lastly, practical engineering issues showed up: rate limits, model variability, and keeping backward compatibility while adding new features like group conversations and two-phase generation. A lot of development time went into making sure new improvements didn’t break existing flows.