clawgui-agent.mp4ClawGUI-Agent controls a real phone via natural language |
clawgui-rl.mp4ClawGUI-RL trains a GUI agent with online reinforcement learning |
- 📄 [2026/4/14] Our paper is available on arXiv: ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents.
- 🔥 [2026/4/13] ClawGUI is released — train with ClawGUI-RL (GiGPO), evaluate with ClawGUI-Eval, deploy with ClawGUI-Agent. ClawGUI-2B, a 2B agent trained end-to-end with this pipeline, hits 17.1 MobileWorld SR vs. the 11.1 baseline. See Quick Start.
ClawGUI is a research framework for GUI agents, covering the complete lifecycle from online RL training and standardized evaluation to real-device deployment.
Building a capable GUI agent involves three tightly coupled problems that are rarely solved together: you need an environment to train the agent online, rigorous benchmarks to measure what it has learned, and a production system to deploy it on real devices. ClawGUI addresses all three.
| Module | Role |
|---|---|
| 🚀 ClawGUI-RL | Build — Train GUI agents online with scalable RL: parallel Docker environments, real Android devices, and GiGPO+PRM for fine-grained step-level rewards |
| 📊 ClawGUI-Eval | Evaluate — Measure what the agent has learned: 6 benchmarks, 11+ models, 95.8% faithful reproduction of official results |
| 🤖 ClawGUI-Agent | Deploy — Use GUI agents in the real world: control mobile devices via natural language through 12+ chat platforms, with one-command evaluation built in |
| 🏆 ClawGUI-2B | End-to-end validation: trained entirely with ClawGUI-RL and GiGPO, achieving 17.1 MobileWorld SR vs. the 11.1 baseline |
git clone https://github.com/ZJU-REAL/ClawGUI.git
cd ClawGUIEach module is independent with its own environment. Click into each one for full installation and usage instructions.
ClawGUI-RL trains GUI agents with online reinforcement learning. It runs dozens of Docker-based Android emulators in parallel or trains directly on physical devices — and replaces standard GRPO with GiGPO+PRM for fine-grained step-level rewards that drive stronger policy learning.
- Parallel multi-environment — Dozens of Docker-based virtual Android environments simultaneously
- Real-device training — Physical or cloud Android phones with the same API
- GiGPO + PRM — Fine-grained step-level reward for better policy optimization than standard GRPO
- Spare server rotation — Automatic failover keeps training running without interruption
- Episode visualization — Record and replay any training trajectory
📁
clawgui-eval/· 📖 Full Documentation · 🤗 Dataset · 🤖 ModelScope
ClawGUI-Eval gives GUI grounding research a reliable measurement baseline. Its three-stage Infer → Judge → Metric pipeline covers 6 benchmarks and 11+ models, with a 95.8% reproduction rate against official results — so numbers across papers are actually comparable.
- 6 benchmarks — ScreenSpot-Pro, ScreenSpot-V2, UIVision, MMBench-GUI, OSWorld-G, AndroidControl
- 11+ models — Qwen3-VL, Qwen2.5-VL, UI-TARS, MAI-UI, GUI-G2, UI-Venus, Gemini, Seed 1.8, and more
- Dual backend — Local GPU (
transformers) or remote API (OpenAI-compatible) - Multi-GPU & multi-thread — Parallel inference with automatic resume
- ClawGUI-Agent integration — Pair with ClawGUI-Agent to run the full pipeline via natural language
→ Get started with ClawGUI-Eval
📁
clawgui-agent/· 📖 Full Documentation · 中文
ClawGUI-Agent closes the loop from training to production. Built on OpenClaw and powered by nanobot, it lets you control Android, HarmonyOS, or iOS devices with natural language from 12+ chat platforms — and trigger the full ClawGUI-Eval benchmark pipeline with a single sentence, no scripts required.
- Cross-platform — Android (ADB), HarmonyOS (HDC), iOS (XCTest)
- Multi-model — AutoGLM, MAI-UI, GUI-Owl, Qwen-VL, UI-TARS via OpenAI-compatible API
- One-command evaluation — Say "benchmark qwen3vl on screenspot-pro" and it handles env check → multi-GPU inference → judging → metrics → result comparison
- Personalized memory — Automatically learns user preferences and injects context across tasks
- Episode recording — Every task saved as structured episodes for replay and dataset building
- Web UI — Gradio interface for device management, task execution, and memory inspection
→ Get started with ClawGUI-Agent
- ClawGUI-Agent — GUI agent framework for phone control and evaluation via natural language
- ClawGUI-RL — Scalable mobile online RL training infrastructure with GiGPO + PRM
- ClawGUI-Eval — Standardized GUI grounding evaluation suite with 6 benchmarks and 95%+ reproduction rate
- ClawGUI-2B — 2B GUI agent trained with GiGPO, achieving 17.1 MobileWorld SR (vs. 11.1 baseline)
- On-device ClawGUI-Agent — Deploy ClawGUI-Agent directly on real phones to avoid cloud-based privacy leakage
- Desktop Online RL — Extend ClawGUI-RL to desktop environments for online reinforcement learning
- Web Online RL — Extend ClawGUI-RL to web environments for online reinforcement learning
- More Skills for ClawGUI-Agent — Add more pluggable skills to expand ClawGUI-Agent's capabilities
- Hybrid CLI & GUI Mechanism — Explore hybrid interaction combining command-line and GUI operations
- Real-time RL — Integrate real-time reinforcement learning based on the OPD algorithm for ClawGUI-RL and ClawGUI-Agent
We welcome contributions of all kinds — new model support, new RL environments, bug fixes, and documentation improvements. See CONTRIBUTING.md for how to get started, module-specific guidelines, and PR requirements.
ClawGUI is built upon the following excellent open-source projects. We sincerely thank their contributors:
This project is licensed under the Apache License 2.0.




