A multi-agent debate arena where an RL-coached debater learns to argue better round by round, judged in real-time by an AI.
DebateMind simulates a structured debate between two AI agents β a Coached Debater and an Opponent β with a neutral Judge scoring each round. The twist: the coached debater is guided by a Reinforcement Learning (RL) agent that learns which argument strategies (logical, emotional, data-driven, etc.) work best for a given topic over multiple rounds.
Key capabilities:
- Enter any debate topic (e.g. "Is remote work better than office work?") and watch two LLMs go head-to-head
- The RL agent selects a prompt strategy each round, receives reward based on judge scores, and updates its policy
- A Judge LLM evaluates both sides on logic, relevance, clarity, and persuasiveness, producing scores and written notes
- Real-time animated chat UI streams arguments as they're generated
- Upload a PDF to inject custom context into the debate (e.g. research papers, articles)
- A Dashboard visualises RL learning curves, score trends, and strategy performance across rounds
- Full debate history is persisted to CSV and can be re-loaded, browsed, or exported (CSV/JSON)
| Layer | Technology |
|---|---|
| UI / Frontend | Streamlit |
| LLM Backend | OpenRouter API (any compatible model, e.g. GPT-4, Claude, Mistral) |
| RL Agent | Custom epsilon-greedy bandit (rl_agent.py) |
| Memory / Storage | CSV files via pandas (debate_memory.csv, judge_summary.csv) |
| PDF Parsing | PyPDF2 |
| HTTP Client | httpx |
| Data Visualisation | Altair + Streamlit metrics |
| Language | Python 3.10+ |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit UI (app.py) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Debate Arenaβ β Dashboard β β PDF Uploader β β
β ββββββββ¬ββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β per-round loop
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (backend/) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. RL Agent (rl_agent.py) β β
β β - Reads last reward from CSV β β
β β - Selects prompt strategy (Ξ΅-greedy) β β
β β - Updates policy weights after each round β β
β βββββββββββββββββββββ¬βββββββββββββββββββββββββββββ β
β β strategy / prompt template β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 2. Coached Debater (debater.py) β β
β β - Builds prompt: topic + strategy + β β
β β previous rounds (from CSV) β β
β β - Calls LLM API β coached_argument β β
β βββββββββββββββββββββ¬βββββββββββββββββββββββββββββ β
β β coached_argument β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 3. Opponent LLM (opponent.py) β β
β β - Receives topic + coached_argument β β
β β - Returns counter-argument (static policy) β β
β βββββββββββββββββββββ¬βββββββββββββββββββββββββββββ β
β β both arguments β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 4. Judge LLM (judge.py) β β
β β - Evaluates both sides (temp=0, JSON out) β β
β β - Scores: logic, relevance, clarity, β β
β β persuasiveness (0β10 each) β β
β β - Returns total_coached, total_opponent, β β
β β notes_coached, notes_opponent β β
β βββββββββββββββββββββ¬βββββββββββββββββββββββββββββ β
β β scores β reward signal β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 5. Memory Manager (memory_manager.py) β β
β β - Appends round to debate_memory.csv β β
β β - Appends scores to judge_summary.csv β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β data/ β
β βββ debate_memory.csv β β round, arguments, reward
β βββ judge_summary.csv β β round scores + notes
β βββ rl_memory.json β β RL policy state
βββββββββββββββββββββββββββ
RL Agent
βββΊ selects strategy
βββΊ Coached Debater generates argument
βββΊ Opponent generates rebuttal
βββΊ Judge scores both
βββΊ reward = coached_score β opponent_score
βββΊ RL Agent updates policy
βββΊ all data saved to CSV
debate-coach/
β
βββ app.py # Streamlit UI β arena, dashboard, PDF upload
βββ backend/
β βββ rl_agent.py # Epsilon-greedy RL agent & strategy selection
β βββ debater.py # Coached LLM interface
β βββ opponent.py # Opponent LLM interface
β βββ judge.py # Judge evaluation (JSON output, temp=0)
β βββ memory_manager.py # CSV read/write via pandas
β βββ config.py # API keys, model names, MAX_ROUNDS
β βββ utils.py # Prompt builders, PDF loader, sanitizers
β
βββ api/ # (API route helpers)
β
βββ data/
β βββ debate_memory.csv
β βββ judge_summary.csv
β βββ rl_memory.json
β
βββ requirements.txt
βββ README.md
git clone https://github.com/haragam22/debatemind.git
cd debatemindpip install -r requirements.txtRequires Python 3.10+. It's recommended to use a virtual environment:
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
Create a .env file in the project root (or set environment variables directly):
OPENROUTER_API_KEY=your_key_hereDebateMind uses OpenRouter to access LLMs. You can swap in any compatible model (GPT-4, Claude, Mistral, etc.) by editing
backend/config.py.
streamlit run app.pyThe app will open at http://localhost:8501.
- Enter a debate topic in the sidebar (e.g. "AI will replace software engineers")
- Set the number of rounds (default: 5)
- Click Start / Reset Simulation
- Click NEXT ROUND to advance β each round generates arguments, a rebuttal, and judge scores
- After all rounds, click View Final Results to see the dashboard and winner
Upload a PDF (research paper, article, etc.) using the uploader on the main page. The extracted text will be injected into the judge and debater prompts for that session.
This project draws on:
- Du et al. (2023) β Improving Factuality and Reasoning through Multiagent Debate
- Liang et al. (2024, EMNLP) β Encouraging Divergent Thinking via Multi-Agent Debate (MAD)
- Kenton et al. (2024) β Scalable Oversight with Weak LLMs Judging Strong LLMs
- Wang et al. (2025) β RL for Reasoning in LLMs with One Training Example
- Zhang et al. (2025) β A Survey of RL for Large Reasoning Models
MIT Β© haragam22