An AI-powered faction-based social deduction game where autonomous agents powered by Claude Sonnet 4 compete through strategic night actions, deduction, and flag capture.
-
Action System (28 tests)
- 8-point action allocation (snoop, raid, kill, defend)
- Success probability calculations with power multipliers
- Action queue with danger-based ordering
- Flag tracking and transfers
-
Resolution System (19 tests)
- Sequential action resolution
- Stochastic success/failure based on probabilities
- Flag transfers on kill/raid
- Evidence generation integration
-
Evidence System (8 tests)
- Physical trait-based evidence
- Ambiguity maximization
- Crime scene vs night activity evidence
- Player trait matching
-
Game Integration (9 tests)
- Flag tracker initialization
- Night/day phase execution
- Victory detection
- Multi-turn progression
-
LLM Action Decisions (15 tests, 11 non-stochastic)
- Real Claude Sonnet 4 integration
- Mock agent for fast testing
- 8-point validation
- Retry logic with fallback
- Structured tool-based output
-
Game Orchestration (17 tests)
- Complete game loop
- Faction assignment (3-6 factions for 3-26 players)
- Night phase: LLM decisions → resolution → evidence
- Day phase: revelations and flag tracking
- Victory checking (all flags or last faction standing)
Run a complete game:
# Play with real AI (Claude Sonnet 4)
python play.py
# With mock AI (fast, deterministic)
python play.py --mock --players 6
# Custom settings
python play.py --players 12 --max-turns 30
# Quiet mode
python play.py --quietGame Orchestrator (game_orchestrator.py)
↓
Night Phase Loop
→ ActionDecisionAgent (llm_actions.py)
→ Claude Sonnet 4 with structured output
→ Returns validated 8-point allocations
→ ActionQueue (actions.py)
→ Sorts by danger level
→ ActionResolver (resolution.py)
→ Stochastic execution
→ Flag transfers
→ EvidenceGenerator (evidence.py)
→ Trait-based clues
↓
Day Phase
→ Reveal deaths
→ Show evidence
→ Discussion
↓
Victory Check
→ All flags held by one faction?
→ Only one faction has survivors?
Objective: Capture all flags or eliminate other factions
Night Phase:
- Each player allocates 8 action points
- Snoop (3x power): Learn target's faction and flag status
- Raid (1x power): Steal flag from target
- Kill (1x power): Eliminate target and take their flags
- Defend: Protect yourself from attacks
Day Phase:
- Deaths revealed with faction
- Evidence shown to all players
- Discussion period (future: voting)
Victory:
- Hold all flags, OR
- Be the last faction with survivors
# Run all non-stochastic tests (92 tests)
pytest tests/test_actions.py \
tests/test_resolution.py \
tests/test_evidence.py \
tests/test_game_integration.py \
tests/test_llm_actions.py::TestMockActionDecisions \
tests/test_llm_actions.py::TestActionDecisionValidation \
tests/test_llm_actions.py::TestActionDecisionRetries \
tests/test_game_orchestrator.py
# Run specific test suite
pytest tests/test_game_orchestrator.py -v
# Run real LLM tests (slow, requires API key)
pytest tests/test_llm_actions.py::TestLLMActionDecisions -vCore Systems:
rills/actions.py- Action allocation, queue, flag trackingrills/resolution.py- Action execution and resolutionrills/evidence.py- Physical evidence generationrills/llm_actions.py- LLM-based action decisions
Orchestration:
rills/game_orchestrator.py- Complete game coordinatorrills/phases/capture_flag_night.py- Night phase handlerrills/phases/capture_flag_day.py- Day phase handler
Demo:
demo_game.py- Playable game with CLI
Tests:
tests/test_actions.py- Action system teststests/test_resolution.py- Resolution teststests/test_evidence.py- Evidence teststests/test_game_integration.py- Integration teststests/test_llm_actions.py- LLM decision teststests/test_game_orchestrator.py- Full game tests
Potential enhancements:
- Location System - Strategic positioning during day phase
- Advanced LLM Context - Track faction reveals and alliances
- Voting/Discussion - Democratic elimination during day
- Special Abilities - Unique powers per player
- Web Interface - Real-time game visualization
- Tournament Mode - Multiple games with ELO rankings
Built using Test-Driven Development:
- Write comprehensive tests first
- Implement minimal code to pass
- Refactor for clarity
- Validate with real LLM API
All 92 non-stochastic tests pass consistently. Real LLM tests validate strategic decision-making.
=== Starting Capture the Flag Game ===
Players: 6
Factions: Red Faction, Blue Faction, Green Faction
📋 Initial Setup:
Red Faction: Alice, Bob
Blue Faction: Charlie, David
Green Faction: Eve, Frank
🚩 Initial Flags:
Red Faction flag → Bob
Blue Faction flag → Charlie
Green Faction flag → Eve
--- Turn 1 ---
🌙 Night 0 begins...
Night 0 ends. 0 player(s) died.
☀️ Day 1 begins...
## Morning Revelations
Everyone survived the night.
## Evidence Found
🔍 **video** - Security camera captured a tall person
🔍 **fabric** - Fabric fibers from dark clothes found at scene
[... evidence continues ...]
## Discussion Period
Players discuss the night's events...
Built with ❤️ using TDD and Claude Sonnet 4