⚔️ Computer Agent Arena

Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

🌐 Website | 📑 Paper (ICLR 2026) | 🏆 Leaderboard | 📝 Blog | 🤝 Contributing

Introduction

Computer Agent Arena is an open, crowdsourced evaluation platform for benchmarking computer-use agents (CUAs) on real-world tasks. Users interact with two AI agents side-by-side on live desktop environments (Ubuntu / Windows) and vote for the better one — producing human preference data at scale that powers a continuously updated ELO leaderboard.

This repository releases the full platform stack: backend server, frontend UI, agent hub, and deployment infrastructure.

Platform Overview

Frontend (React 18 + TypeScript): Dual-agent chat panel, live VNC desktop viewer, leaderboard
Backend (Flask + Socket.IO): User sessions, VM pool orchestration, agent execution, and evaluation
Agent Hub: Pluggable implementations for 10+ frontier models
Infrastructure: AWS EC2 multi-region VM pool (Ubuntu / Windows) with adaptive auto-scaling

Layer	Technology
Frontend	React 18, TypeScript, Ant Design, Tailwind CSS, Socket.IO
Backend	Python, Flask, Flask-SocketIO
Database	PostgreSQL, Redis
Infrastructure	AWS EC2 (multi-region), S3
Auth	Google OAuth 2.0, JWT, Anonymous access, Prolific

Supported Agents

Model	Organization
GPT-4.1, GPT-5	OpenAI
Computer-Use-Preview	OpenAI
Claude 3.7 / 4 Sonnet, Claude Sonnet 4.5	Anthropic
Gemini 2.5 Pro	Google
Qwen2.5-VL-72B	Alibaba
UI-TARS-1.5	ByteDance
OpenCUA	XLang Lab
CoAct	—

🚀 Quick Start

Prerequisites

Python 3.8+, Node.js 16+
PostgreSQL, Redis
AWS account (for VM pool) or local VMware / VirtualBox

1. Clone

git clone https://github.com/xlang-ai/computer-agent-arena.git
cd computer-agent-arena

2. Backend

pip install -r backend/requirements.txt

Create a .env file with your database, Redis, AWS, API keys, and auth credentials. Configure config.yaml:

deployment: local   # or 'aws'

Start the server:

python -m backend.main   # listens on :8181

3. Frontend

cd frontend
npm install
npm start   # dev server on :3000

Adding a New Agent

Create backend/agents/hub/YourAgent/ and implement a class extending BaseAgent.
Register the model and method name in config.yaml under AVAILABLE_AGENT_OPTIONS.
Test with python backend/agents/test/test_agents.py.

See existing implementations in backend/agents/hub/ (Anthropic, OpenAICUA, UI_TARS, OpenCUA, coact) for reference.

Repository Structure

computer-agent-arena/
├── backend/
│   ├── main.py              # Entry point
│   ├── agents/              # Agent hub + base classes
│   │   └── hub/             # Per-model implementations
│   ├── api/                 # WebSocket / REST handlers
│   ├── desktop_env/         # VM abstraction (AWS, VMware, ...)
│   └── utils/               # DB, S3, socket utilities
├── frontend/                # React 18 + TypeScript UI
│   └── src/
└── config.yaml              # Deployment configuration

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
backend		backend
frontend		frontend
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚔️ Computer Agent Arena

Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

Introduction

Platform Overview

Supported Agents

🚀 Quick Start

Prerequisites

1. Clone

2. Backend

3. Frontend

Adding a New Agent

Repository Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

⚔️ Computer Agent Arena

Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

Introduction

Platform Overview

Supported Agents

🚀 Quick Start

Prerequisites

1. Clone

2. Backend

3. Frontend

Adding a New Agent

Repository Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages