A competitive platform for pitting Large Language Models against each other in a strategic word-guessing game. Watch AI models battle in real-time as attackers try to trick defenders into revealing secret words.
Adversarial Taboo is an interactive demonstration platform that showcases the strategic capabilities of different LLM models. In this game:
- Attacker: Knows a secret word and must craft clever hints to trick the defender into saying it
- Defender: Must infer the secret word from hints while trying to guess it correctly
- Objective: Compare how different models perform in adversarial scenarios
- Real-time LLM Battles: Watch AI models compete in live conversations
- Model Comparison: Compare pre-trained vs post-trained model performance
- Multiple API Support: Compatible with OpenAI, Anthropic, and custom API endpoints
- Interactive Gameplay: Up to 5 turns with automatic win/tie detection
- Visual Feedback: Animated messages and game state indicators
- Comprehensive Rules: Clear game mechanics with instant feedback
This project intentionally showcases a side-by-side comparison between a model in its original (pre-trained) state and a model after additional adaptation. Specifically, the demo is built to demonstrate the effect of applying the SPAG project (see: https://github.com/Linear95/SPAG) for post-training/adaptation. The "pre-trained" vs "post-trained" labels in the UI are intended to help you compare behavior before and after such adaptation.
Important: this is only a demonstration — in practice you may supply any model or endpoint you prefer. The app accepts OpenAI-compatible endpoints, Anthropic endpoints, or custom proxies, so feel free to plug in your own models, keys, or fine-tuning pipelines.
- Node.js 18+
- npm or yarn
- API keys for your preferred LLM providers
-
Clone the repository
git clone https://github.com/Haydenkkk/adversarial-taboo.git cd adversarial-taboo -
Install dependencies
npm install
-
Configure environment variables
Create a
.envfile in the root directory:# Attacker Model (fixed) VITE_ATTACKER_MODEL=gpt-5 VITE_ATTACKER_API_KEY=your_attacker_api_key VITE_ATTACKER_BASE_URL=https://api.openai.com/v1 # Defender Models VITE_DEFENDER_PRE_TRAINED_MODEL=gpt-3.5-turbo VITE_DEFENDER_PRE_TRAINED_API_KEY=your_pre_trained_api_key VITE_DEFENDER_PRE_TRAINED_BASE_URL=https://api.openai.com/v1 VITE_DEFENDER_POST_TRAINED_MODEL=gpt-4 VITE_DEFENDER_POST_TRAINED_API_KEY=your_post_trained_api_key VITE_DEFENDER_POST_TRAINED_BASE_URL=https://api.openai.com/v1
-
Start the development server
npm run dev
-
Open your browser
Navigate to
http://localhost:7404to start playing!
- Setup: A secret word is randomly selected and assigned to the Attacker
- Attacker's Turn: Provides hints about the secret word without saying it directly
- Defender's Turn: Responds to hints and may attempt to guess the word
- Guessing Format: To guess, the Defender must say:
"I know the word! It is {guess}" - Win Conditions:
- Attacker Wins: If Defender says the secret word unconsciously OR makes an incorrect formatted guess
- Defender Wins: If Defender correctly guesses the word in the proper format
- Tie: If maximum turns (5) are reached without a winner
- Pre-trained: Standard model performance
- Post-trained: Fine-tuned or specialized model performance
- The platform defaults to pre-trained model selection
The platform supports multiple LLM providers:
- OpenAI: GPT-5, GPT-4, and compatible models
- Anthropic: Claude models via their API
- Custom Endpoints: Any OpenAI-compatible API
| Variable | Description | Default |
|---|---|---|
VITE_ATTACKER_MODEL |
Model used for the attacker role | gpt-3.5-turbo |
VITE_ATTACKER_API_KEY |
API key for attacker model | Required |
VITE_ATTACKER_BASE_URL |
Base URL for attacker API | https://api.openai.com/v1 |
VITE_DEFENDER_PRE_TRAINED_MODEL |
Pre-trained defender model | gpt-3.5-turbo |
VITE_DEFENDER_PRE_TRAINED_API_KEY |
API key for pre-trained defender | Required |
VITE_DEFENDER_PRE_TRAINED_BASE_URL |
Base URL for pre-trained defender | https://api.openai.com/v1 |
VITE_DEFENDER_POST_TRAINED_MODEL |
Post-trained defender model | gpt-4 |
VITE_DEFENDER_POST_TRAINED_API_KEY |
API key for post-trained defender | Required |
VITE_DEFENDER_POST_TRAINED_BASE_URL |
Base URL for post-trained defender | https://api.openai.com/v1 |
- Frontend: React 18 + TypeScript
- Build Tool: Vite
- UI Library: Shadcn/ui + Radix UI
- Styling: Tailwind CSS
- Icons: Lucide React
- State Management: React Hooks
src/
├── components/
│ ├── GameArena.tsx # Main game component
│ ├── GameMessage.tsx # Individual message display
│ ├── GameStats.tsx # Game statistics
│ ├── ModelSelector.tsx # Model selection interface
│ └── ui/ # Reusable UI components
├── services/
│ └── llm.ts # LLM API integration
├── hooks/
│ └── use-toast.ts # Toast notifications
└── lib/
└── utils.ts # Utility functions
- Initialization: Secret word selected, models configured
- Turn Loop: Alternating attacker/defender turns up to 5 rounds
- Word Detection: Real-time monitoring for secret word usage
- Win Evaluation: Automatic determination of game outcome
- Result Display: Animated feedback with detailed reasoning
Ready to watch AI models battle? Start your first game now! 🎯🤖