Inspiration Most LLMs operate as a single, isolated "black box" that can easily hallucinate or provide one-sided answers. We were inspired by the Socratic Method—the idea that the best way to find the truth is through rigorous debate. We wanted to build a system where AI isn't just an assistant, but a Council of specialized minds that challenge each other to reach a superior, verified conclusion.
What it does Debate Arena is a multi-agent orchestration framework that processes user queries through a competitive gauntlet.
Persona Gauntlet: Five specialized agents—the Logician, Maverick, Archivist, Networker, and Visionary—analyze the prompt from conflicting perspectives.
Real-time Research: The Networker uses the You.com API to feed live, real-time web data into the debate.
The Verdict: A Pro-tier "Judge" model synthesizes these arguments into a final verdict.
Reflexive Memory: The system stores every session in Postgres and Pinecone, allowing it to calculate average weights of past successful debates to self-correct and improve future reasoning.
How we built it Orchestration: Built entirely on n8n, utilizing complex branching logic and JavaScript Code nodes to manage agent interactions.
Models: Powered by Google Gemini 1.5 Pro (for the Judge) and Gemini 1.5 Flash (for the specialized agents) to balance deep reasoning with high-speed performance.
Database & Memory: Used PostgreSQL on Render for structured session data and Pinecone for vector similarity searches to retrieve "memory" from previous related queries.
Live Web Data: Integrated the You.com Content API to fetch and convert live web pages into LLM-friendly Markdown and Metadata.
Challenges we ran into Networking Hurdles: Connecting local FastAPI endpoints to Dockerized n8n instances required deep-diving into Docker bridge networking and troubleshooting ENOTFOUND and ECONNREFUSED errors.
Context Overload: Passing entire web pages to every agent threatened to hit token limits. We solved this by building a custom n8n "summarizer" logic and switching to Markdown to increase token efficiency.
The "Loop" Problem: Ensuring the Judge only ran once after the Pinecone upsert was complete (avoiding execution for every document split) required careful use of Limit nodes and custom flow control.
Accomplishments that we're proud of Self-Correcting Logic: Successfully implemented a system that calculates the average of weights across multiple historical items to inform the current debate.
Zero-Hallucination Guardrails: By forcing the "Archivist" and "Networker" to provide grounding data, we significantly reduced the typical LLM "drift."
Deployment: Successfully moving the stack from a local environment to a stable cloud deployment on Render.
What we learned Agentic Workflows: We learned that personas are more than just system prompts; they require specific "temperatures" and data access to be effective.
Vector Database Management: We gained experience in managing Pinecone metadata to filter searches by userId and queryId rather than just performing generic searches.
Data Type Strictness: We dealt with the realities of strict PostgreSQL schemas, especially handling UUID and JSONB types within an automation platform.
What's next for Debate Arena Dynamic Persona Selection: Automatically spinning up new agent types (e.g., a "Legal Specialist" or "Medical Reviewer") based on the query topic.
Human-in-the-Loop: Allowing users to "vote" on the best argument, which would then feed back into the Pinecone weights to further refine the Judge's accuracy.
Mobile Integration: Bringing the Arena to a mobile UI where users can see the "battle" of ideas happen in real-time via WebSockets.
Log in or sign up for Devpost to join the conversation.