Skip to content

bitterbridge/rills

Repository files navigation

Capture the Flag 🚩

An AI-powered faction-based social deduction game where autonomous agents powered by Claude Sonnet 4 compete through strategic night actions, deduction, and flag capture.

✅ What's Working

Core Systems (92 Tests Passing)

  1. Action System (28 tests)

    • 8-point action allocation (snoop, raid, kill, defend)
    • Success probability calculations with power multipliers
    • Action queue with danger-based ordering
    • Flag tracking and transfers
  2. Resolution System (19 tests)

    • Sequential action resolution
    • Stochastic success/failure based on probabilities
    • Flag transfers on kill/raid
    • Evidence generation integration
  3. Evidence System (8 tests)

    • Physical trait-based evidence
    • Ambiguity maximization
    • Crime scene vs night activity evidence
    • Player trait matching
  4. Game Integration (9 tests)

    • Flag tracker initialization
    • Night/day phase execution
    • Victory detection
    • Multi-turn progression
  5. LLM Action Decisions (15 tests, 11 non-stochastic)

    • Real Claude Sonnet 4 integration
    • Mock agent for fast testing
    • 8-point validation
    • Retry logic with fallback
    • Structured tool-based output
  6. Game Orchestration (17 tests)

    • Complete game loop
    • Faction assignment (3-6 factions for 3-26 players)
    • Night phase: LLM decisions → resolution → evidence
    • Day phase: revelations and flag tracking
    • Victory checking (all flags or last faction standing)

Demo Game

Run a complete game:

# Play with real AI (Claude Sonnet 4)
python play.py

# With mock AI (fast, deterministic)
python play.py --mock --players 6

# Custom settings
python play.py --players 12 --max-turns 30

# Quiet mode
python play.py --quiet

📊 System Architecture

Game Orchestrator (game_orchestrator.py)
    ↓
Night Phase Loop
    → ActionDecisionAgent (llm_actions.py)
        → Claude Sonnet 4 with structured output
        → Returns validated 8-point allocations
    → ActionQueue (actions.py)
        → Sorts by danger level
    → ActionResolver (resolution.py)
        → Stochastic execution
        → Flag transfers
    → EvidenceGenerator (evidence.py)
        → Trait-based clues
    ↓
Day Phase
    → Reveal deaths
    → Show evidence
    → Discussion
    ↓
Victory Check
    → All flags held by one faction?
    → Only one faction has survivors?

🎮 Game Rules

Objective: Capture all flags or eliminate other factions

Night Phase:

  • Each player allocates 8 action points
  • Snoop (3x power): Learn target's faction and flag status
  • Raid (1x power): Steal flag from target
  • Kill (1x power): Eliminate target and take their flags
  • Defend: Protect yourself from attacks

Day Phase:

  • Deaths revealed with faction
  • Evidence shown to all players
  • Discussion period (future: voting)

Victory:

  • Hold all flags, OR
  • Be the last faction with survivors

🧪 Testing

# Run all non-stochastic tests (92 tests)
pytest tests/test_actions.py \
       tests/test_resolution.py \
       tests/test_evidence.py \
       tests/test_game_integration.py \
       tests/test_llm_actions.py::TestMockActionDecisions \
       tests/test_llm_actions.py::TestActionDecisionValidation \
       tests/test_llm_actions.py::TestActionDecisionRetries \
       tests/test_game_orchestrator.py

# Run specific test suite
pytest tests/test_game_orchestrator.py -v

# Run real LLM tests (slow, requires API key)
pytest tests/test_llm_actions.py::TestLLMActionDecisions -v

📁 Key Files

Core Systems:

  • rills/actions.py - Action allocation, queue, flag tracking
  • rills/resolution.py - Action execution and resolution
  • rills/evidence.py - Physical evidence generation
  • rills/llm_actions.py - LLM-based action decisions

Orchestration:

  • rills/game_orchestrator.py - Complete game coordinator
  • rills/phases/capture_flag_night.py - Night phase handler
  • rills/phases/capture_flag_day.py - Day phase handler

Demo:

  • demo_game.py - Playable game with CLI

Tests:

  • tests/test_actions.py - Action system tests
  • tests/test_resolution.py - Resolution tests
  • tests/test_evidence.py - Evidence tests
  • tests/test_game_integration.py - Integration tests
  • tests/test_llm_actions.py - LLM decision tests
  • tests/test_game_orchestrator.py - Full game tests

🚀 Next Steps

Potential enhancements:

  1. Location System - Strategic positioning during day phase
  2. Advanced LLM Context - Track faction reveals and alliances
  3. Voting/Discussion - Democratic elimination during day
  4. Special Abilities - Unique powers per player
  5. Web Interface - Real-time game visualization
  6. Tournament Mode - Multiple games with ELO rankings

🎯 Development Approach

Built using Test-Driven Development:

  1. Write comprehensive tests first
  2. Implement minimal code to pass
  3. Refactor for clarity
  4. Validate with real LLM API

All 92 non-stochastic tests pass consistently. Real LLM tests validate strategic decision-making.

📝 Game Example

=== Starting Capture the Flag Game ===
Players: 6
Factions: Red Faction, Blue Faction, Green Faction

📋 Initial Setup:
  Red Faction: Alice, Bob
  Blue Faction: Charlie, David
  Green Faction: Eve, Frank

🚩 Initial Flags:
  Red Faction flag → Bob
  Blue Faction flag → Charlie
  Green Faction flag → Eve

--- Turn 1 ---
🌙 Night 0 begins...
Night 0 ends. 0 player(s) died.

☀️ Day 1 begins...
## Morning Revelations
Everyone survived the night.

## Evidence Found
🔍 **video** - Security camera captured a tall person
🔍 **fabric** - Fabric fibers from dark clothes found at scene
[... evidence continues ...]

## Discussion Period
Players discuss the night's events...

Built with ❤️ using TDD and Claude Sonnet 4

About

A little experiment with LLMs playing _Mafia_, basically.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages