Multi-agent university simulation benchmark where LLM-powered agents (professors, students, staff, IT, management, applicants) interact within configurable university archetypes. Compares LLM providers (Claude, Qwen, Kimi, Gemini) on coordination quality, research output, and emergent behavior.
This isn't just another multi-agent chat. It combines three powerful ideas:
- Concordia-style Game Master (Google DeepMind) -- centralized conflict resolution where agents observe, plan, act, and a GM adjudicates outcomes via LLM
- OpenClaw-inspired Agent Coordination -- inter-agent P2P/broadcast messaging + shared task board with dependency chains, enabling bottom-up emergent coordination
- Moltbook Molt Dynamics Metrics (arxiv 2603.03555) -- emergence score, role specialization index, core-periphery analysis, and communication reciprocity tracking
Agents (observe/plan/act)
|
v
SimulationEngine ── MessageRouter (P2P + broadcast)
| TaskBoard (post/claim/complete with deps)
v
GameMaster (LLM-based conflict resolution)
|
v
MetricsTracker ── MoltDynamicsMetrics (emergence scoring)
|
v
WorkspaceManager (per-agent persistence)
Time compression: 6 real months compressed to 6 simulation days (36 steps/day, 216 total steps).
- 6 agent roles: Professor, Student, Staff, IT Staff, Management, Applicant
- Generative Agents memory system (stream + retrieval + reflection)
- Academic calendar with phases: Semester Start, Mid-Semester, Application Period, Finals, Summer Research
- Parallelized agent observe/plan/act via asyncio
- Continuous 24/7 simulation: All 4 universities run as parallel async loops that restart automatically
- Cross-cycle learning: Agents restore memory and state from previous runs via Supabase persistence
- Error isolation: One university crashing doesn't affect others -- auto-retries after configurable interval
- API control: Start/stop/pause/resume via REST endpoints (
/api/always-on/*) - Auto-start: Set
ALWAYS_ON=trueenv var to start on API boot, or trigger via API endpoint - Configurable concurrency: Semaphore-based throttling (
ALWAYS_ON_MAX_CONCURRENT, default 2)
- Inter-Agent Messaging: P2P direct messages + broadcast filtered by role/department
- Shared Task Board: Agents post tasks with priority and dependencies; others claim and complete them
- Cross-role collaboration tracking: Measures how different roles work together organically
- Emergence Score (0-100): Composite of specialization, reciprocity, task completion, cross-role collab, activity distribution
- Role Specialization Index: Shannon entropy of action distributions per agent
- Core-Periphery Analysis: Identifies hub agents vs peripheral ones, Gini coefficient
- Communication Dynamics: Response rates, reciprocity, information bridges
- 6-panel real-time dashboard: Live Feed, Metrics Chart, Emergence Panel, Message Feed, Task Board, Agent Roster
- D3 force-directed interaction graph with core/periphery highlighting
- WebSocket streaming from FastAPI backend
- Supabase persistence for simulation events
- Compare LLM providers (Claude, Qwen, Kimi, Gemini) on identical scenarios
- 4 university archetypes with different success metrics
- Parallel trial execution with semaphore-based throttling
- Statistical analysis with per-provider breakdowns
| Config | Type | Success Metric |
|---|---|---|
| tech_school | MIT-like | research_output |
| business_school | HBS-like | network_quality |
| legacy_school | Oxford-like | prestige_maintenance |
| commonwealth | Public university | accessibility |
# Clone
git clone https://github.com/lonexreb/university-sim.git
cd university-sim
# Python backend
python -m venv .venv
source .venv/bin/activate
pip install -e .
# Set API keys (at least one provider)
cp .env.example .env
# Edit .env with your keys
# Run tests
python -m pytest tests/ -q
# Run a simulation (36 steps = 1 simulated day)
python scripts/run_simulation.py --university tech_school --provider qwen --steps 36
# Run full benchmark
python scripts/run_benchmark.py --providers qwen kimi gemini --trials 3# Terminal 1: FastAPI backend
source .venv/bin/activate
python scripts/run_api.py
# Terminal 2: Next.js frontend
cd web
npm install
npm run devOpen http://localhost:3000 to access the dashboard.
# Option 1: Auto-start on boot
ALWAYS_ON=true python scripts/run_api.py
# Option 2: Start via API
python scripts/run_api.py
curl -X POST localhost:8000/api/always-on/start
curl localhost:8000/api/always-on/status
curl -X POST localhost:8000/api/always-on/stop
# Option 3: Start with custom provider/interval
curl -X POST "localhost:8000/api/always-on/start?provider=qwen&interval=120"
# Pause/resume a single university
curl -X POST localhost:8000/api/always-on/tech_school/pause
curl -X POST localhost:8000/api/always-on/tech_school/resumeEnvironment variables for always-on mode:
| Variable | Default | Description |
|---|---|---|
ALWAYS_ON |
false |
Auto-start on API boot |
ALWAYS_ON_PROVIDER |
gemini |
LLM provider (cheapest default) |
ALWAYS_ON_INTERVAL |
60 |
Seconds between cycles |
ALWAYS_ON_MAX_CONCURRENT |
2 |
Max universities running in parallel |
src/
agents/ # BaseAgent + 6 roles + MemorySystem + WorkspaceManager + factory
core/ # SimulationEngine, GameMaster, CampusEnvironment, MessageRouter, TaskBoard, EventBus
llm/ # LLMGateway (LiteLLM multi-provider), TokenTracker
metrics/ # Research, Network, Prestige, Accessibility, Coordination, MoltDynamics
benchmarking/ # Parallel ExperimentRunner, StatisticalAnalysis, ReportGenerator
api/ # FastAPI backend with Supabase, WebSocket broadcasting, always-on runner
web/ # Next.js 16 + React 19 + Tailwind 4 + D3 + Recharts dashboard
config/
universities/ # 4 archetype YAML configs
simulation.yaml # Global settings
scripts/ # CLI runners
tests/ # 104 tests
| Provider | Model | Key Env Var |
|---|---|---|
| qwen | openai/qwen-plus | DASHSCOPE_API_KEY |
| kimi | openai/kimi-k2-0711-preview | MOONSHOT_API_KEY |
| gemini | gemini/gemini-2.5-flash | GOOGLE_API_KEY |
| claude | anthropic/claude-sonnet-4-5-20250929 | ANTHROPIC_API_KEY |
Backend: Python 3.10+, LiteLLM, NetworkX, NumPy, FastAPI, Supabase, asyncio, tenacity
Frontend: Next.js 16, React 19, Tailwind CSS 4, Recharts, D3.js, WebSocket
- Concordia -- Game Master pattern for multi-agent simulation (Google DeepMind)
- Generative Agents -- Memory stream + retrieval + reflection (Park et al., 2023)
- OpenClaw -- Agent Teams RFC, inter-agent messaging, per-agent workspaces
- Molt Dynamics -- Emergent social phenomena in autonomous AI agent populations (Moltbook research)
MIT