Skip to content

haragam22/debatemind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 DebateMind β€” AI-Powered Debate Simulation with Reinforcement Learning

A multi-agent debate arena where an RL-coached debater learns to argue better round by round, judged in real-time by an AI.


What It Does

DebateMind simulates a structured debate between two AI agents β€” a Coached Debater and an Opponent β€” with a neutral Judge scoring each round. The twist: the coached debater is guided by a Reinforcement Learning (RL) agent that learns which argument strategies (logical, emotional, data-driven, etc.) work best for a given topic over multiple rounds.

Key capabilities:

  • Enter any debate topic (e.g. "Is remote work better than office work?") and watch two LLMs go head-to-head
  • The RL agent selects a prompt strategy each round, receives reward based on judge scores, and updates its policy
  • A Judge LLM evaluates both sides on logic, relevance, clarity, and persuasiveness, producing scores and written notes
  • Real-time animated chat UI streams arguments as they're generated
  • Upload a PDF to inject custom context into the debate (e.g. research papers, articles)
  • A Dashboard visualises RL learning curves, score trends, and strategy performance across rounds
  • Full debate history is persisted to CSV and can be re-loaded, browsed, or exported (CSV/JSON)

Tech Stack

Layer Technology
UI / Frontend Streamlit
LLM Backend OpenRouter API (any compatible model, e.g. GPT-4, Claude, Mistral)
RL Agent Custom epsilon-greedy bandit (rl_agent.py)
Memory / Storage CSV files via pandas (debate_memory.csv, judge_summary.csv)
PDF Parsing PyPDF2
HTTP Client httpx
Data Visualisation Altair + Streamlit metrics
Language Python 3.10+

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Streamlit UI (app.py)               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Debate Arenaβ”‚  β”‚  Dashboard   β”‚  β”‚ PDF Uploader β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚  per-round loop
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Backend (backend/)                    β”‚
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  1. RL Agent (rl_agent.py)                     β”‚     β”‚
β”‚  β”‚     - Reads last reward from CSV               β”‚     β”‚
β”‚  β”‚     - Selects prompt strategy (Ξ΅-greedy)       β”‚     β”‚
β”‚  β”‚     - Updates policy weights after each round  β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                      β”‚ strategy / prompt template        β”‚
β”‚                      β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  2. Coached Debater (debater.py)               β”‚     β”‚
β”‚  β”‚     - Builds prompt: topic + strategy +        β”‚     β”‚
β”‚  β”‚       previous rounds (from CSV)               β”‚     β”‚
β”‚  β”‚     - Calls LLM API β†’ coached_argument         β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                      β”‚ coached_argument                  β”‚
β”‚                      β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  3. Opponent LLM (opponent.py)                 β”‚     β”‚
β”‚  β”‚     - Receives topic + coached_argument        β”‚     β”‚
β”‚  β”‚     - Returns counter-argument (static policy) β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                      β”‚ both arguments                    β”‚
β”‚                      β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  4. Judge LLM (judge.py)                       β”‚     β”‚
β”‚  β”‚     - Evaluates both sides (temp=0, JSON out)  β”‚     β”‚
β”‚  β”‚     - Scores: logic, relevance, clarity,       β”‚     β”‚
β”‚  β”‚       persuasiveness (0–10 each)               β”‚     β”‚
β”‚  β”‚     - Returns total_coached, total_opponent,   β”‚     β”‚
β”‚  β”‚       notes_coached, notes_opponent            β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                      β”‚ scores β†’ reward signal            β”‚
β”‚                      β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  5. Memory Manager (memory_manager.py)         β”‚     β”‚
β”‚  β”‚     - Appends round to debate_memory.csv       β”‚     β”‚
β”‚  β”‚     - Appends scores to judge_summary.csv      β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  data/                  β”‚
β”‚  β”œβ”€β”€ debate_memory.csv  β”‚  ← round, arguments, reward
β”‚  β”œβ”€β”€ judge_summary.csv  β”‚  ← round scores + notes
β”‚  └── rl_memory.json     β”‚  ← RL policy state
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Round Flow (per round)

RL Agent
  └─► selects strategy
        └─► Coached Debater generates argument
              └─► Opponent generates rebuttal
                    └─► Judge scores both
                          └─► reward = coached_score βˆ’ opponent_score
                                └─► RL Agent updates policy
                                      └─► all data saved to CSV

File Structure

debate-coach/
β”‚
β”œβ”€β”€ app.py                    # Streamlit UI β€” arena, dashboard, PDF upload
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ rl_agent.py           # Epsilon-greedy RL agent & strategy selection
β”‚   β”œβ”€β”€ debater.py            # Coached LLM interface
β”‚   β”œβ”€β”€ opponent.py           # Opponent LLM interface
β”‚   β”œβ”€β”€ judge.py              # Judge evaluation (JSON output, temp=0)
β”‚   β”œβ”€β”€ memory_manager.py     # CSV read/write via pandas
β”‚   β”œβ”€β”€ config.py             # API keys, model names, MAX_ROUNDS
β”‚   └── utils.py              # Prompt builders, PDF loader, sanitizers
β”‚
β”œβ”€β”€ api/                      # (API route helpers)
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ debate_memory.csv
β”‚   β”œβ”€β”€ judge_summary.csv
β”‚   └── rl_memory.json
β”‚
β”œβ”€β”€ requirements.txt
└── README.md

How to Run

1. Clone the repo

git clone https://github.com/haragam22/debatemind.git
cd debatemind

2. Install dependencies

pip install -r requirements.txt

Requires Python 3.10+. It's recommended to use a virtual environment:

python -m venv venv && source venv/bin/activate  # Windows: venv\Scripts\activate

3. Set up your API key

Create a .env file in the project root (or set environment variables directly):

OPENROUTER_API_KEY=your_key_here

DebateMind uses OpenRouter to access LLMs. You can swap in any compatible model (GPT-4, Claude, Mistral, etc.) by editing backend/config.py.

4. Launch the app

streamlit run app.py

The app will open at http://localhost:8501.

5. Start a debate

  1. Enter a debate topic in the sidebar (e.g. "AI will replace software engineers")
  2. Set the number of rounds (default: 5)
  3. Click Start / Reset Simulation
  4. Click NEXT ROUND to advance β€” each round generates arguments, a rebuttal, and judge scores
  5. After all rounds, click View Final Results to see the dashboard and winner

Optional: Add PDF context

Upload a PDF (research paper, article, etc.) using the uploader on the main page. The extracted text will be injected into the judge and debater prompts for that session.


Research Backing

This project draws on:

  • Du et al. (2023) β€” Improving Factuality and Reasoning through Multiagent Debate
  • Liang et al. (2024, EMNLP) β€” Encouraging Divergent Thinking via Multi-Agent Debate (MAD)
  • Kenton et al. (2024) β€” Scalable Oversight with Weak LLMs Judging Strong LLMs
  • Wang et al. (2025) β€” RL for Reasoning in LLMs with One Training Example
  • Zhang et al. (2025) β€” A Survey of RL for Large Reasoning Models

License

MIT Β© haragam22

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors