ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

English | 中文

A full-stack framework for GUI agents, covering online RL training, standardized evaluation, and deployment.

clawgui-agent.mp4

ClawGUI-Agent controls a real phone
via natural language

clawgui-rl.mp4

ClawGUI-RL trains a GUI agent with online
reinforcement learning

News

📄 [2026/4/14] Our paper is available on arXiv: ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents.
🔥 [2026/4/13] ClawGUI is released — train with ClawGUI-RL (GiGPO), evaluate with ClawGUI-Eval, deploy with ClawGUI-Agent. ClawGUI-2B, a 2B agent trained end-to-end with this pipeline, hits 17.1 MobileWorld SR vs. the 11.1 baseline. See Quick Start.

💡 Overview

ClawGUI is a research framework for GUI agents, covering the complete lifecycle from online RL training and standardized evaluation to real-device deployment.

Building a capable GUI agent involves three tightly coupled problems that are rarely solved together: you need an environment to train the agent online, rigorous benchmarks to measure what it has learned, and a production system to deploy it on real devices. ClawGUI addresses all three.

Module	Role
🚀 ClawGUI-RL	Build — Train GUI agents online with scalable RL: parallel Docker environments, real Android devices, and GiGPO+PRM for fine-grained step-level rewards
📊 ClawGUI-Eval	Evaluate — Measure what the agent has learned: 6 benchmarks, 11+ models, 95.8% faithful reproduction of official results
🤖 ClawGUI-Agent	Deploy — Use GUI agents in the real world: control mobile devices via natural language through 12+ chat platforms, with one-command evaluation built in
🏆 ClawGUI-2B	End-to-end validation: trained entirely with ClawGUI-RL and GiGPO, achieving 17.1 MobileWorld SR vs. the 11.1 baseline

🏗️ Architecture

🚀 Quick Start

git clone https://github.com/ZJU-REAL/ClawGUI.git
cd ClawGUI

Each module is independent with its own environment. Click into each one for full installation and usage instructions.

🚀 ClawGUI-RL — Build

📁 clawgui-rl/ · 📖 Full Documentation

ClawGUI-RL trains GUI agents with online reinforcement learning. It runs dozens of Docker-based Android emulators in parallel or trains directly on physical devices — and replaces standard GRPO with GiGPO+PRM for fine-grained step-level rewards that drive stronger policy learning.

Parallel multi-environment — Dozens of Docker-based virtual Android environments simultaneously
Real-device training — Physical or cloud Android phones with the same API
GiGPO + PRM — Fine-grained step-level reward for better policy optimization than standard GRPO
Spare server rotation — Automatic failover keeps training running without interruption
Episode visualization — Record and replay any training trajectory

→ Get started with ClawGUI-RL

📊 ClawGUI-Eval — Evaluate

📁 clawgui-eval/ · 📖 Full Documentation · 🤗 Dataset · 🤖 ModelScope

ClawGUI-Eval gives GUI grounding research a reliable measurement baseline. Its three-stage Infer → Judge → Metric pipeline covers 6 benchmarks and 11+ models, with a 95.8% reproduction rate against official results — so numbers across papers are actually comparable.

6 benchmarks — ScreenSpot-Pro, ScreenSpot-V2, UIVision, MMBench-GUI, OSWorld-G, AndroidControl
11+ models — Qwen3-VL, Qwen2.5-VL, UI-TARS, MAI-UI, GUI-G2, UI-Venus, Gemini, Seed 1.8, and more
Dual backend — Local GPU (transformers) or remote API (OpenAI-compatible)
Multi-GPU & multi-thread — Parallel inference with automatic resume
ClawGUI-Agent integration — Pair with ClawGUI-Agent to run the full pipeline via natural language

→ Get started with ClawGUI-Eval

🤖 ClawGUI-Agent — Deploy

📁 clawgui-agent/ · 📖 Full Documentation · 中文

ClawGUI-Agent closes the loop from training to production. Built on OpenClaw and powered by nanobot, it lets you control Android, HarmonyOS, or iOS devices with natural language from 12+ chat platforms — and trigger the full ClawGUI-Eval benchmark pipeline with a single sentence, no scripts required.

Cross-platform — Android (ADB), HarmonyOS (HDC), iOS (XCTest)
Multi-model — AutoGLM, MAI-UI, GUI-Owl, Qwen-VL, UI-TARS via OpenAI-compatible API
One-command evaluation — Say "benchmark qwen3vl on screenspot-pro" and it handles env check → multi-GPU inference → judging → metrics → result comparison
Personalized memory — Automatically learns user preferences and injects context across tasks
Episode recording — Every task saved as structured episodes for replay and dataset building
Web UI — Gradio interface for device management, task execution, and memory inspection

→ Get started with ClawGUI-Agent

🎯 Roadmap

🤝 Contributing

We welcome contributions of all kinds — new model support, new RL environments, bug fixes, and documentation improvements. See CONTRIBUTING.md for how to get started, module-specific guidelines, and PR requirements.

🙏 Acknowledgements

ClawGUI is built upon the following excellent open-source projects. We sincerely thank their contributors:

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github		.github
assets		assets
clawgui-agent		clawgui-agent
clawgui-eval		clawgui-eval
clawgui-rl		clawgui-rl
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

News

Table of Contents

💡 Overview

🏗️ Architecture