Skip to content

Haydenkkk/adversarial-taboo

Repository files navigation

Adversarial Taboo

A competitive platform for pitting Large Language Models against each other in a strategic word-guessing game. Watch AI models battle in real-time as attackers try to trick defenders into revealing secret words.

Game Preview TypeScript Vite

🎯 Overview

Adversarial Taboo is an interactive demonstration platform that showcases the strategic capabilities of different LLM models. In this game:

  • Attacker: Knows a secret word and must craft clever hints to trick the defender into saying it
  • Defender: Must infer the secret word from hints while trying to guess it correctly
  • Objective: Compare how different models perform in adversarial scenarios

✨ Features

  • Real-time LLM Battles: Watch AI models compete in live conversations
  • Model Comparison: Compare pre-trained vs post-trained model performance
  • Multiple API Support: Compatible with OpenAI, Anthropic, and custom API endpoints
  • Interactive Gameplay: Up to 5 turns with automatic win/tie detection
  • Visual Feedback: Animated messages and game state indicators
  • Comprehensive Rules: Clear game mechanics with instant feedback

Note about pre/post comparison

This project intentionally showcases a side-by-side comparison between a model in its original (pre-trained) state and a model after additional adaptation. Specifically, the demo is built to demonstrate the effect of applying the SPAG project (see: https://github.com/Linear95/SPAG) for post-training/adaptation. The "pre-trained" vs "post-trained" labels in the UI are intended to help you compare behavior before and after such adaptation.

Important: this is only a demonstration — in practice you may supply any model or endpoint you prefer. The app accepts OpenAI-compatible endpoints, Anthropic endpoints, or custom proxies, so feel free to plug in your own models, keys, or fine-tuning pipelines.

🚀 Quick Start

Prerequisites

  • Node.js 18+
  • npm or yarn
  • API keys for your preferred LLM providers

Installation

  1. Clone the repository

    git clone https://github.com/Haydenkkk/adversarial-taboo.git
    cd adversarial-taboo
  2. Install dependencies

    npm install
  3. Configure environment variables

    Create a .env file in the root directory:

    # Attacker Model (fixed)
    VITE_ATTACKER_MODEL=gpt-5
    VITE_ATTACKER_API_KEY=your_attacker_api_key
    VITE_ATTACKER_BASE_URL=https://api.openai.com/v1
    
    # Defender Models
    VITE_DEFENDER_PRE_TRAINED_MODEL=gpt-3.5-turbo
    VITE_DEFENDER_PRE_TRAINED_API_KEY=your_pre_trained_api_key
    VITE_DEFENDER_PRE_TRAINED_BASE_URL=https://api.openai.com/v1
    
    VITE_DEFENDER_POST_TRAINED_MODEL=gpt-4
    VITE_DEFENDER_POST_TRAINED_API_KEY=your_post_trained_api_key
    VITE_DEFENDER_POST_TRAINED_BASE_URL=https://api.openai.com/v1
  4. Start the development server

    npm run dev
  5. Open your browser

    Navigate to http://localhost:7404 to start playing!

🎮 How to Play

Game Rules

  1. Setup: A secret word is randomly selected and assigned to the Attacker
  2. Attacker's Turn: Provides hints about the secret word without saying it directly
  3. Defender's Turn: Responds to hints and may attempt to guess the word
  4. Guessing Format: To guess, the Defender must say: "I know the word! It is {guess}"
  5. Win Conditions:
    • Attacker Wins: If Defender says the secret word unconsciously OR makes an incorrect formatted guess
    • Defender Wins: If Defender correctly guesses the word in the proper format
    • Tie: If maximum turns (5) are reached without a winner

Model Selection

  • Pre-trained: Standard model performance
  • Post-trained: Fine-tuned or specialized model performance
  • The platform defaults to pre-trained model selection

🛠️ Configuration

Supported APIs

The platform supports multiple LLM providers:

  • OpenAI: GPT-5, GPT-4, and compatible models
  • Anthropic: Claude models via their API
  • Custom Endpoints: Any OpenAI-compatible API

Environment Variables

Variable Description Default
VITE_ATTACKER_MODEL Model used for the attacker role gpt-3.5-turbo
VITE_ATTACKER_API_KEY API key for attacker model Required
VITE_ATTACKER_BASE_URL Base URL for attacker API https://api.openai.com/v1
VITE_DEFENDER_PRE_TRAINED_MODEL Pre-trained defender model gpt-3.5-turbo
VITE_DEFENDER_PRE_TRAINED_API_KEY API key for pre-trained defender Required
VITE_DEFENDER_PRE_TRAINED_BASE_URL Base URL for pre-trained defender https://api.openai.com/v1
VITE_DEFENDER_POST_TRAINED_MODEL Post-trained defender model gpt-4
VITE_DEFENDER_POST_TRAINED_API_KEY API key for post-trained defender Required
VITE_DEFENDER_POST_TRAINED_BASE_URL Base URL for post-trained defender https://api.openai.com/v1

🏗️ Architecture

Tech Stack

  • Frontend: React 18 + TypeScript
  • Build Tool: Vite
  • UI Library: Shadcn/ui + Radix UI
  • Styling: Tailwind CSS
  • Icons: Lucide React
  • State Management: React Hooks

Project Structure

src/
├── components/
│   ├── GameArena.tsx      # Main game component
│   ├── GameMessage.tsx    # Individual message display
│   ├── GameStats.tsx      # Game statistics
│   ├── ModelSelector.tsx  # Model selection interface
│   └── ui/                # Reusable UI components
├── services/
│   └── llm.ts            # LLM API integration
├── hooks/
│   └── use-toast.ts      # Toast notifications
└── lib/
    └── utils.ts          # Utility functions

📊 Game Flow

  1. Initialization: Secret word selected, models configured
  2. Turn Loop: Alternating attacker/defender turns up to 5 rounds
  3. Word Detection: Real-time monitoring for secret word usage
  4. Win Evaluation: Automatic determination of game outcome
  5. Result Display: Animated feedback with detailed reasoning

Ready to watch AI models battle? Start your first game now! 🎯🤖

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages