Skip to content

xlang-ai/computer-agent-arena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚔️ Computer Agent Arena

Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

🌐 Website  |  📑 Paper (ICLR 2026)  |  🏆 Leaderboard  |  📝 Blog  |  🤝 Contributing

ICLR 2026 License Python Node Contributions


Introduction

Computer Agent Arena is an open, crowdsourced evaluation platform for benchmarking computer-use agents (CUAs) on real-world tasks. Users interact with two AI agents side-by-side on live desktop environments (Ubuntu / Windows) and vote for the better one — producing human preference data at scale that powers a continuously updated ELO leaderboard.

This repository releases the full platform stack: backend server, frontend UI, agent hub, and deployment infrastructure.


Platform Overview

  • Frontend (React 18 + TypeScript): Dual-agent chat panel, live VNC desktop viewer, leaderboard
  • Backend (Flask + Socket.IO): User sessions, VM pool orchestration, agent execution, and evaluation
  • Agent Hub: Pluggable implementations for 10+ frontier models
  • Infrastructure: AWS EC2 multi-region VM pool (Ubuntu / Windows) with adaptive auto-scaling
Layer Technology
Frontend React 18, TypeScript, Ant Design, Tailwind CSS, Socket.IO
Backend Python, Flask, Flask-SocketIO
Database PostgreSQL, Redis
Infrastructure AWS EC2 (multi-region), S3
Auth Google OAuth 2.0, JWT, Anonymous access, Prolific

Supported Agents

Model Organization
GPT-4.1, GPT-5 OpenAI
Computer-Use-Preview OpenAI
Claude 3.7 / 4 Sonnet, Claude Sonnet 4.5 Anthropic
Gemini 2.5 Pro Google
Qwen2.5-VL-72B Alibaba
UI-TARS-1.5 ByteDance
OpenCUA XLang Lab
CoAct

🚀 Quick Start

Prerequisites

  • Python 3.8+, Node.js 16+
  • PostgreSQL, Redis
  • AWS account (for VM pool) or local VMware / VirtualBox

1. Clone

git clone https://github.com/xlang-ai/computer-agent-arena.git
cd computer-agent-arena

2. Backend

pip install -r backend/requirements.txt

Create a .env file with your database, Redis, AWS, API keys, and auth credentials. Configure config.yaml:

deployment: local   # or 'aws'

Start the server:

python -m backend.main   # listens on :8181

3. Frontend

cd frontend
npm install
npm start   # dev server on :3000

Adding a New Agent

  1. Create backend/agents/hub/YourAgent/ and implement a class extending BaseAgent.
  2. Register the model and method name in config.yaml under AVAILABLE_AGENT_OPTIONS.
  3. Test with python backend/agents/test/test_agents.py.

See existing implementations in backend/agents/hub/ (Anthropic, OpenAICUA, UI_TARS, OpenCUA, coact) for reference.


Repository Structure

computer-agent-arena/
├── backend/
│   ├── main.py              # Entry point
│   ├── agents/              # Agent hub + base classes
│   │   └── hub/             # Per-model implementations
│   ├── api/                 # WebSocket / REST handlers
│   ├── desktop_env/         # VM abstraction (AWS, VMware, ...)
│   └── utils/               # DB, S3, socket utilities
├── frontend/                # React 18 + TypeScript UI
│   └── src/
└── config.yaml              # Deployment configuration

License

MIT License — see LICENSE for details.

About

[ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages