clawgui-agent

ClawGUI-Agent: Personal Phone GUI Assistant

ClawGUI-Agent is the deployment module of ClawGUI. Built on OpenClaw and powered by nanobot, it provides two core capabilities: GUI phone control and one-command evaluation. For phone control, it drives a Vision-Language Model through a closed-loop "screenshot → reasoning → action" cycle to autonomously complete tasks on Android, HarmonyOS, and iOS devices — accessible from Feishu, QQ, Telegram, and 12+ other chat platforms. For evaluation, a single natural-language command triggers the full ClawGUI-Eval pipeline: environment check, multi-GPU inference, judging, and metric reporting.

✨ Key Features

nanobot Integration — Remotely control phones from 12+ chat platforms including Feishu / DingTalk / Telegram / Discord / Slack / QQ, issue tasks anytime anywhere
GUI Phone Control — Powered by OpenClaw, AI autonomously captures screenshots, understands the screen, and performs tap/swipe/type GUI actions to complete complex tasks
ClawGUI-Eval Integration — Built-in ClawGUI-Eval evaluation skill, launch GUI Grounding model benchmarks with natural language (environment check → multi-GPU inference → judging → metric calculation), with automatic progress monitoring and result comparison against official baselines
Multi-Model Support — Compatible with AutoGLM, Qwen VL, UI-TARS, MAI-UI, GUI-Owl and more VLMs, connected via OpenAI-compatible API
Personalized Memory — Automatically learns user preferences (contacts, frequently used apps, habits), with a vector-search-based persistent memory system
Real-time Episode Recording — Each task execution (screenshots + model outputs + actions) is saved as a structured episode, enabling replay and dataset construction
Web UI — Gradio-based web interface for device management, task execution visualization, manual takeover, memory management and more

🏗️ Architecture

🔄 How the Agent Works

Understanding the execution loop helps with configuration and debugging. PhoneAgent.run() in phone_agent/agent.py follows this cycle for each task:

Screenshot — Capture the current screen via ADB (screencap), HDC, or XCTest depending on the device backend.
Memory retrieval — Query the vector memory store for relevant memories from past interactions (contacts, app knowledge, user preferences). Top-k similar memories are appended to the system context.
History construction — Assemble the multi-turn conversation history: each past step contributes a (user: screenshot + instruction, assistant: reasoning + action) pair, up to history_length steps back.
VLM call — Send the prompt (system prompt + history + current screenshot + task instruction) to the configured GUI model via OpenAI-compatible API.
Action parsing — Extract the structured action from the model output. Different models use different output formats (autoglm, uitars, qwenvl, maiui, guiowl adapters in phone_agent/model/adapters.py).
Coordinate normalization — Convert the model's output coordinates to absolute device pixels. AutoGLM uses [0, 1000] normalized coordinates; UI-TARS uses absolute pixel coordinates in smart_resize space; Qwen-VL uses absolute pixels; MAI-UI uses [0, 1000].
Action execution — Send the action to the device backend: tap, long-press, swipe, type, home, back, or task-complete. Each action type has a dedicated handler in phone_agent/actions/.
Trace recording — If traceEnabled=True, append the screenshot, reasoning, and action to the episode tracer for later replay or training data export.
Memory update — After task completion, extract contact names, app knowledge, and user habits from the conversation and upsert them into the vector store with deduplication.

This loop runs until the model outputs a terminate or answer action, or max_steps is reached.

🚀 Quick Start

Requirements

Python: ≥ 3.11
Package Manager: uv (recommended) or conda + pip

1. Installation

Assuming you have cloned the ClawGUI project and are in the root directory:

Option A: uv (recommended)

cd clawgui-agent

# Create virtual environment
uv venv .venv --python 3.12

# Activate
source .venv/bin/activate

# Install phone_agent
uv pip install -e .

# Install nanobot
uv pip install -e nanobot/

Option B: conda + pip

cd clawgui-agent

# Create conda environment
conda create -n opengui python=3.12 -y
conda activate opengui

# Install phone_agent
pip install -e .

# Install nanobot
pip install -e nanobot/

2. Initialize and Edit Configuration

Run the onboard wizard to generate default config:

nanobot onboard

Then edit ~/.nanobot/config.json. Here is a reference configuration:

We recommend using autoglm-phone as the external GUI model for phone control.

{
  "agents": {
    "defaults": {
      "workspace": "/path/to/ClawGUI",
      "model": "glm-5",
      "provider": "zhipu",
      "maxTokens": 8192,
      "contextWindowTokens": 131072,
      "temperature": 0.1,
      "maxToolIterations": 40
    }
  },
  "providers": {
    "zhipu": {
      "apiKey": "YOUR_ZHIPU_API_KEY",
      "apiBase": "https://open.bigmodel.cn/api/paas/v4/"
    },
    "openrouter": {
      "apiKey": "YOUR_OPENROUTER_API_KEY",
      "apiBase": "https://openrouter.ai/api/v1"
    }
  },
  "tools": {
    "gui": {
      "enable": true,
      "deviceType": "adb",
      "deviceId": null,
      "maxSteps": 50,
      "useExternalModel": true,
      "guiBaseUrl": "https://openrouter.ai/api/v1",
      "guiApiKey": "YOUR_OPENROUTER_API_KEY",
      "guiModelName": "autoglm-phone",
      "promptTemplateLang": "en",
      "promptTemplateStyle": "autoglm",
      "traceEnabled": false,
      "traceDir": "gui_trace"
    },
    "exec": {
      "enable": true,
      "timeout": 60
    }
  }
}

Important: workspace Path Setting

Set workspace to the ClawGUI project root — the directory that contains both clawgui-agent/ and clawgui-eval/. The built-in evaluation skill uses this path to locate the evaluation framework. For example, if your project lives at /home/user/ClawGUI, set workspace to "/home/user/ClawGUI".

GUI Tool Parameters

Parameter	Description
`enable`	Enable/disable the GUI phone control tool
`deviceType`	Device type: `adb` (Android) or `hdc` (HarmonyOS)
`deviceId`	Specific device ID, `null` for auto-detection
`maxSteps`	Maximum execution steps per task
`useExternalModel`	Use an external GUI-specific model (recommended `true`)
`guiBaseUrl`	API endpoint for the external GUI model
`guiApiKey`	API key for the external GUI model
`guiModelName`	External GUI model name, used with guiBaseUrl
`promptTemplateLang`	Prompt language: `cn` / `en`
`promptTemplateStyle`	Prompt template style: `autoglm` / `uitars` / `qwenvl` etc.
`traceEnabled`	Enable episode recording
`traceDir`	Episode save directory

3. Connect Android Device

The controlled phone must be connected (e.g. via USB) to the server machine where ClawGUI-Agent is installed.

Step 1: Install ADB

Option A: Install via package manager

macOS (recommended: brew):

brew install android-platform-tools

Linux:

sudo apt install android-tools-adb   # Ubuntu/Debian

Windows: See this blog tutorial to download and configure PATH.

Option B: Manual download

Download the official ADB platform-tools and extract it, then add it to your PATH:

macOS / Linux:

# Assuming extracted to ~/Downloads/platform-tools
export PATH=${PATH}:~/Downloads/platform-tools

Windows: Add the extracted directory (e.g. C:\platform-tools) to the system PATH environment variable.

Step 2: Connect Phone and Enable USB Debugging

Enable Developer Mode: Go to Settings > About Phone > Build Number, tap rapidly ~10 times until you see "You are now a developer"
Enable USB Debugging: Go to Settings > Developer Options > USB Debugging, toggle it on (some devices may require a restart)
Verify connection:

adb devices

# Expected output:
# List of devices attached
# <your_device_id>   device

Step 3: Install ADB Keyboard (optional)

ADB Keyboard is used for text input. Download ADBKeyboard.apk and install:

adb install ADBKeyboard.apk
adb shell ime enable com.android.adbkeyboard/.AdbIME

Note: This step is optional. The framework will auto-detect and prompt installation when needed.

Other Platforms (HarmonyOS / iOS)

See the Open-AutoGLM device connection guide.

4. Configure Chat Platforms (optional)

To remotely control the phone via chat platforms, enable the corresponding platform in channels within config.json and fill in credentials.

Feishu / Lark

📖 Click to expand setup steps

Step 1: Go to Feishu Open Platform, click Create App on the homepage, select Enterprise Self-Built App, fill in the app name and description, and enable the Bot capability.
Step 2: Click Permission Management on the left sidebar, then click Enable Permissions.
Step 3: Search for and enable the following permissions in the text box: im:message, im:message.p2p_msg:readonly, cardkit:card:write

If cardkit:card:write cannot be added, set "streaming": false in channels.feishu (see config below). The bot will still work normally; replies use regular interactive cards without token-by-token streaming.
Step 4: Click Event & Callback on the left, click Subscription Method, and select Persistent Event Reception (requires ClawGUI-Agent to be running to establish the connection).
Step 5: Go to Credentials & Basic Info on the left to get your App ID and App Secret.
Step 6: Click Publish App.
Step 7: Open Feishu, go to any group, click the group settings, click Group Bots, then Add Bot, and add the bot you just created to the group.
Step 8: @mention the bot in the group and send a message.

Configure in ~/.nanobot/config.json:

"feishu": {
  "enabled": true,
  "appId": "YOUR_APP_ID",
  "appSecret": "YOUR_APP_SECRET",
  "encryptKey": "",
  "verificationToken": "",
  "allowFrom": ["*"],
  "groupPolicy": "mention"
}

allowFrom set to ["*"] allows all users. To restrict, provide a list of user Open IDs. groupPolicy set to "mention" means the bot only responds when @mentioned in groups.

QQ

Go to QQ Open Platform and create a bot application
Obtain the App ID and Secret
Configure in ~/.nanobot/config.json:

"qq": {
  "enabled": true,
  "appId": "YOUR_APP_ID",
  "secret": "YOUR_SECRET",
  "allowFrom": ["*"]
}

Other Platforms

nanobot also supports Telegram, Discord, Slack, DingTalk, WeCom, WhatsApp, Email and 12+ more platforms. Set "enabled": true in the corresponding channels field and fill in credentials.

🚀 Run

Control Phone via nanobot Chat

Start the nanobot gateway service:

nanobot gateway

Once started, you can send messages on configured chat platforms (e.g. Feishu) to control the phone:

Open WeChat and send "I'll be late" to Zhang San

nanobot will invoke the gui_execute tool, automatically capturing screenshots → VLM reasoning → executing phone actions in a loop until the task is completed.

📊 ClawGUI-Eval Evaluation

ClawGUI-Agent includes a built-in ClawGUI-Eval skill that turns natural language into a complete benchmark run — from GPU environment check through multi-GPU inference, judging, and metric reporting — without writing a single script.

Prerequisites

workspace correctly set: workspace in config.json points to the ClawGUI root directory (see configuration above)
ClawGUI-Eval environment installed: Follow ClawGUI-Eval README to install and download data
GPU available: Inference requires NVIDIA GPUs
(Recommended) Install FlashAttention-2: pip install flash-attn --no-build-isolation — the framework falls back to SDPA automatically if not installed, but precision may be slightly lower

Usage

Simply say it in a nanobot conversation:

Benchmark qwen3vl 2b model on screenspot-pro

Run uivision and osworld-g evaluation with MAI-UI-8B

nanobot will automatically:

Environment Check — Check GPU, CUDA, FlashAttention-2, data integrity
Inference — Generate run scripts from templates, launch multi-GPU parallel inference in background, monitor progress in real-time
Judging — Automatically select and run the corresponding judge script
Metric Calculation — Automatically select and run the corresponding metric script
Result Report — Present accuracy, sub-category breakdowns, and comparison against official baselines

Supported Evaluation Models

Model Type	Example HuggingFace ID
`qwen3vl`	Qwen/Qwen3-VL-2B/4B/8B-Instruct
`qwen25vl`	Qwen/Qwen2.5-VL-3B/7B-Instruct
`maiui`	Tongyi-MAI/MAI-UI-2B/8B
`uitars`	ByteDance-Seed/UI-TARS-1.5-7B
`uivenus15`	inclusionAI/UI-Venus-1.5-2B/8B
`guiowl15`	mPLUG/GUI-Owl-1.5-2B/4B/8B-Instruct
`guig2`	inclusionAI/GUI-G2-7B
`stepgui`	stepfun-ai/GELab-Zero-4B-preview
`uivenus`	inclusionAI/UI-Venus-Ground-7B

Supported Benchmarks: ScreenSpot-Pro, ScreenSpot-V2, UIVision, MMBench-GUI, OSWorld-G, AndroidControl

📱 GUI Phone Control

The following features are part of ClawGUI-Agent's phone/device control capabilities, driven by the gui_execute tool.

You can also invoke the GUI agent directly via command line:

python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4/ \
  --model autoglm-phone \
  --apikey <YOUR_API_KEY> \
  --max-steps 100 \
  --lang cn \
  "Open QQ Music, play Justin Bieber's Baby and add it to favorites. If it is already favorited, just play it. After it starts playing, pause it, then go back and play Bieber's Love Me."

Web UI

In addition to chat platform control, you can use the Web UI directly:

python webui.py

Opens at http://localhost:7860 by default, featuring:

Device Management: Connect/disconnect devices, view device status
Task Execution: Enter task descriptions, watch screenshots and AI reasoning in real-time
Manual Takeover: Switch to manual control for scenarios like CAPTCHAs
Memory Management: View/edit/clear memory data
Configuration Panel: Graphical model parameter settings

Memory System

The framework includes a built-in personalized memory system (phone_agent/memory/). After each completed task, the agent extracts structured facts from the conversation — contact names and relationships, app-specific knowledge, user habits and preferences — and upserts them into a persistent store as JSON records with numpy vector embeddings. On subsequent tasks, the top-k most semantically similar memories are retrieved and injected into the system context, letting the agent recognize "Zhang San" as the user's colleague or know which music app the user prefers. Duplicate memories are detected and merged rather than accumulated, keeping the store lean. Multi-user isolation is supported via per-user namespaces.

Supported GUI Models

The framework supports multiple Vision-Language Models via an adapter pattern:

Model	`promptTemplateStyle`	Provider
AutoGLM-Phone-9B	`autoglm`	Zhipu AI
Doubao-1.5-UI-TARS	`uitars`	ByteDance
Qwen2.5-VL / Qwen3-VL	`qwenvl`	Alibaba Cloud
MAI-UI	`maiui`	Alibaba Cloud
GUI-Owl-7B/32B	`guiowl`	mPLUG

All models are connected via OpenAI-compatible API and can be deployed locally with vLLM / SGLang, or connected to cloud services such as Zhipu BigModel, Alibaba Cloud Bailian, or OpenRouter.

📁 Directory Structure

ClawGUI-Agent/
├── main.py                      # CLI entry point
├── webui.py                     # Gradio Web UI entry point
├── ios.py                       # iOS CLI entry point
├── setup.py                     # Package setup
├── requirements.txt             # Python dependencies
│
├── phone_agent/                 # Core phone automation package
│   ├── agent.py                 # PhoneAgent main class (screenshot→VLM→action loop)
│   ├── agent_ios.py             # IOSPhoneAgent class
│   ├── device_factory.py        # Device type factory (ADB / HDC / XCTest)
│   ├── tracer.py                # Episode execution tracer
│   ├── config/                  # Configuration & prompts (8 template files)
│   ├── model/                   # Model clients & adapters (5 VLM adapters)
│   ├── adb/                     # Android ADB device control
│   ├── hdc/                     # HarmonyOS HDC device control
│   ├── xctest/                  # iOS XCTest device control
│   ├── actions/                 # Action handlers (tap, swipe, type, etc.)
│   └── memory/                  # Personalized memory system (vector store)
│
├── nanobot/                     # nanobot subproject
│   ├── nanobot/                 # nanobot core package
│   │   ├── agent/               # Agent core + GUI tool
│   │   ├── channels/            # 12+ chat platform integrations
│   │   ├── providers/           # 20+ LLM provider adapters
│   │   └── skills/              # Pluggable skills (gui-mobile, clawgui-eval)
│   ├── pyproject.toml
│   └── README.md
│
├── examples/                    # Usage examples
└── scripts/                     # Deployment & verification scripts

📄 License

This project is licensed under the Apache License 2.0. The nanobot subproject is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

ClawGUI-Agent: Personal Phone GUI Assistant

📑 Table of Contents

✨ Key Features

🏗️ Architecture

🔄 How the Agent Works

🚀 Quick Start

Requirements

1. Installation

Option A: uv (recommended)

Option B: conda + pip

2. Initialize and Edit Configuration

GUI Tool Parameters

3. Connect Android Device

Step 1: Install ADB

Step 2: Connect Phone and Enable USB Debugging

Step 3: Install ADB Keyboard (optional)

Other Platforms (HarmonyOS / iOS)

4. Configure Chat Platforms (optional)

Feishu / Lark

QQ

Other Platforms

🚀 Run

Control Phone via nanobot Chat

📊 ClawGUI-Eval Evaluation

Prerequisites

Usage

Supported Evaluation Models

📱 GUI Phone Control

Web UI

Memory System

Supported GUI Models

📁 Directory Structure

📄 License

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
examples		examples
nanobot		nanobot
phone_agent		phone_agent
scripts		scripts
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
ios.py		ios.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
test.txt		test.txt
webui.py		webui.py

FilesExpand file tree

clawgui-agent

Directory actions

More options

Directory actions

More options

Latest commit

History

clawgui-agent

Folders and files

parent directory

README.md

ClawGUI-Agent: Personal Phone GUI Assistant

📑 Table of Contents

✨ Key Features

🏗️ Architecture

🔄 How the Agent Works

🚀 Quick Start

Requirements

1. Installation

Option A: uv (recommended)

Option B: conda + pip

2. Initialize and Edit Configuration

GUI Tool Parameters

3. Connect Android Device

Step 1: Install ADB

Step 2: Connect Phone and Enable USB Debugging

Step 3: Install ADB Keyboard (optional)

Other Platforms (HarmonyOS / iOS)

4. Configure Chat Platforms (optional)

Feishu / Lark

QQ

Other Platforms

🚀 Run

Control Phone via nanobot Chat

📊 ClawGUI-Eval Evaluation

Prerequisites

Usage

Supported Evaluation Models

📱 GUI Phone Control

Web UI

Memory System

Supported GUI Models

📁 Directory Structure

📄 License