🌐 Website | 📑 Paper (ICLR 2026) | 🏆 Leaderboard | 📝 Blog | 🤝 Contributing
Computer Agent Arena is an open, crowdsourced evaluation platform for benchmarking computer-use agents (CUAs) on real-world tasks. Users interact with two AI agents side-by-side on live desktop environments (Ubuntu / Windows) and vote for the better one — producing human preference data at scale that powers a continuously updated ELO leaderboard.
This repository releases the full platform stack: backend server, frontend UI, agent hub, and deployment infrastructure.
- Frontend (React 18 + TypeScript): Dual-agent chat panel, live VNC desktop viewer, leaderboard
- Backend (Flask + Socket.IO): User sessions, VM pool orchestration, agent execution, and evaluation
- Agent Hub: Pluggable implementations for 10+ frontier models
- Infrastructure: AWS EC2 multi-region VM pool (Ubuntu / Windows) with adaptive auto-scaling
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Ant Design, Tailwind CSS, Socket.IO |
| Backend | Python, Flask, Flask-SocketIO |
| Database | PostgreSQL, Redis |
| Infrastructure | AWS EC2 (multi-region), S3 |
| Auth | Google OAuth 2.0, JWT, Anonymous access, Prolific |
| Model | Organization |
|---|---|
| GPT-4.1, GPT-5 | OpenAI |
| Computer-Use-Preview | OpenAI |
| Claude 3.7 / 4 Sonnet, Claude Sonnet 4.5 | Anthropic |
| Gemini 2.5 Pro | |
| Qwen2.5-VL-72B | Alibaba |
| UI-TARS-1.5 | ByteDance |
| OpenCUA | XLang Lab |
| CoAct | — |
- Python 3.8+, Node.js 16+
- PostgreSQL, Redis
- AWS account (for VM pool) or local VMware / VirtualBox
git clone https://github.com/xlang-ai/computer-agent-arena.git
cd computer-agent-arenapip install -r backend/requirements.txtCreate a .env file with your database, Redis, AWS, API keys, and auth credentials. Configure config.yaml:
deployment: local # or 'aws'Start the server:
python -m backend.main # listens on :8181cd frontend
npm install
npm start # dev server on :3000- Create
backend/agents/hub/YourAgent/and implement a class extendingBaseAgent. - Register the model and method name in
config.yamlunderAVAILABLE_AGENT_OPTIONS. - Test with
python backend/agents/test/test_agents.py.
See existing implementations in backend/agents/hub/ (Anthropic, OpenAICUA, UI_TARS, OpenCUA, coact) for reference.
computer-agent-arena/
├── backend/
│ ├── main.py # Entry point
│ ├── agents/ # Agent hub + base classes
│ │ └── hub/ # Per-model implementations
│ ├── api/ # WebSocket / REST handlers
│ ├── desktop_env/ # VM abstraction (AWS, VMware, ...)
│ └── utils/ # DB, S3, socket utilities
├── frontend/ # React 18 + TypeScript UI
│ └── src/
└── config.yaml # Deployment configuration
MIT License — see LICENSE for details.