Pointer

A real-screen computer-use agent powered by multimodal LLMs. Built on Agent Zero with the Computer (Pointer) profile: see the display like a human, interact with UIs, fill forms, and complete tasks across applications.

Model support: Only DashScope qwen3.5-plus is verified so far. You must set api_base to the correct regional endpoint as well as the API key—see §4.4.

1. What Pointer Is and What It Does

Pointer is an AI assistant for real desktop and browser UIs. It does not rely on spinning up a separate page in a headless browser; it captures what is visible on your monitor, understands on-screen elements, and acts through mouse, keyboard, shortcuts, and composite actions—the same inputs a person uses.

Capabilities include:

Interactions in web and desktop apps: click, type, scroll, multi-select, drag, and more
Multi-step flows: login wizards, forms, file pickers, paginated reading and extraction
Vision: numbered UI overlays plus optional zoomed regions to reduce mis-clicks
Structured memory: per-task screen extraction, merge, task_done:checkpoint / read, and persisted execution state (see agents/computer/ docs)

Pointer runs as an Agent Zero profile (computer), so tool orchestration, dialogue, and extensions work the same way. The product name is Pointer.

2. Use Cases and Demos

Goal: We aim to fully replace repetitive, physical computer work—the routine clicking, typing, and screen navigation that people do every day—with capable automation. That includes personal daily tasks (your own desktop and browser workflows) and enterprise employees’ day-to-day work (repeated operational steps across standard business apps and web consoles), wherever vision, judgment, and multi-step control are needed.

Example use cases:

Complex web workflows (login, search, forms, admin consoles, multi-page collection)
Desktop apps and hybrid work (real windows in the foreground, coexisting with your workflow)
Repetitive human–computer tasks that need vision + reasoning + long horizons

Video demo:
(TBD: link or embed)

Try it online / pilot access:
Contact [email protected] for trials, demos, or partnerships.

3. Core Features

Area	Description
Human-like operation	Live screenshots and UI annotation; prefer element indices, fall back to coordinates; OS-specific shortcut hints for macOS, Windows, and Linux.
Credential safety	Dedicated login and credential handling to avoid putting plaintext passwords in model-visible prompts; guided secure fill (see Computer Agent tool specs).
CAPTCHA handling	captcha_verify and related flows: when a CAPTCHA-like UI is detected, choose click / type / drag per protocol (exact behavior depends on implementation and model).
Full Agent Zero stack	Skills, memory and learning, code execution, subordinate agents, MCP, browser tooling, and more—usable alongside Pointer as your task requires.
Long-horizon tasks	Screen extraction, merge by task index, `task_done:checkpoint` / `read`, and persisted plans / progress / learnings to bound context and improve recoverability.

For architecture, file layout, env vars, and tools, see agents/computer/README.md.

4. Installation and Deployment

4.1 Requirements

Python: 3.10+ recommended (match project dependencies)
OS: macOS, Windows, and Linux; Pointer injects OS-specific shortcut references
Permissions: Screen capture and accessibility / assistive APIs may be required for screenshots and input—grant them per your OS settings

4.2 Install dependencies

From the repository root:

pip install -r requirements.txt
pip install -r requirements2.txt

requirements2.txt includes LiteLLM and the OpenAI SDK (model calling). Install both files.

Optional for development / tests:

pip install -r requirements.dev.txt

4.3 OS notes

OS	Notes
macOS	Allow your terminal or Python process Screen Recording and Accessibility; otherwise capture and control may fail.
Windows	Mind HiDPI scaling; run under a normal interactive desktop session (not a headless session without a desktop).
Linux	Needs a display server (e.g. X11). Headless hosts need a virtual display (e.g. Xvfb)—(TBD: recommended distros and minimal setup)

4.4 API keys and models

Copy or create an env file (e.g. .env) if your workflow uses one.
In the Web UI settings or .env, set keys for your provider (see conf/model_providers.yaml), e.g.:
- OpenAI, OpenRouter, Anthropic, etc.
- DashScope (Qwen): DASHSCOPE_API_KEY or API_KEY_DASHSCOPE (provider id: dashscope)
Set api_base for DashScope (required). LiteLLM does not auto-pick China vs international. In Settings, fill API base for each role that uses DashScope (Chat, Utility, Browser, etc.—field names such as chat model API base, utility model API base, browser model API base), or ensure the value is merged from conf/model_providers.yaml defaults. Use the URL that matches where your key was issued:

Region api_base

China (Beijing) https://dashscope.aliyuncs.com/compatible-mode/v1

International https://dashscope-intl.aliyuncs.com/compatible-mode/v1

Wrong region + key combination typically causes auth or routing errors.
Pick models for Chat, Utility, Browser, etc. Computer use typically needs a vision-capable Utility model for screenshot understanding and extraction.

Tested models (today): Only Qwen 3.5 Plus on DashScope has been validated end-to-end with Pointer. Use provider dashscope, the correct regional api_base (see table above), DASHSCOPE_API_KEY (or API_KEY_DASHSCOPE), and model name qwen3.5-plus (or the exact id from your DashScope console). Chat, Utility, and other roles may all need this stack until broader compatibility is confirmed. Other providers and model IDs are untested and may not work.

4.5 Work directory (workdir)

Set Settings → workdir. Pointer writes under it:

computer/snapshots/ — screenshots and debug images
computer/extract_data/, computer/task_done/, computer/execution_checkpoint/ — extraction and task state

Avoid pointing workdir at read-only or ephemeral locations unless intentional.

4.6 Annotation service (required for Computer)

Pointer calls an HTTP annotation service to produce numbered UI boxes (default COMPUTER_ANNOTATE_API_BASE, path /api/v1/annotate/all).

(TBD: how to deploy the service, Docker image, or doc links)

Common environment variables (details in agents/computer/README.md):

Variable	Default	Meaning
`COMPUTER_ANNOTATE_API_BASE`	`http://127.0.0.1:8000`	Annotation API base URL
`COMPUTER_ANNOTATE_TIMEOUT`	`120`	Request timeout (seconds)

4.7 Start the Web UI

python run_ui.py

(TBD: default port, HTTPS, reverse-proxy examples)

5. Roadmap

Lightweight RPA — Web data extraction, chat-driven tasks, and visiting social or content sites (e.g. Xiaohongshu, X, Facebook) where work stays mostly in the browser or a small set of familiar UIs.
Cross-application work — Uploading and downloading files across apps, delivering results in chat when a task completes, and auto-filling forms that span multiple programs or sites.
Research — Work on professional or domain data that is public but requires login to access; produce and hand off research outputs—for example, legal case research and similar deep dives.
Digital coworkers — Full loop: receive a task → plan → execute on the machine → deliver results, aimed at recurring operational work patterned after roles such as developers, HR, and finance.

6. Important warning

Pointer has full control of the computer (mouse, keyboard, system-level shortcuts). With bad prompts, model errors, or ambiguous goals, it may:

Delete or corrupt files and important data
Trigger unintended payments, emails, or account changes
Leak privacy (screen content is sent to your model provider)

Use only in controlled environments, with backups and least privilege; high-stakes production use requires human oversight and auditing.
You are solely responsible for any direct or indirect damage from using this software.

7. Contact

Email: [email protected]
Feedback, trial requests, partnerships, custom work: [email protected]

License and acknowledgments

See LICENSE in the repository root (if it differs from upstream Agent Zero, the file in this repo prevails).

Pointer builds on Agent Zero; Computer / Pointer code lives under agents/computer/. Thanks to Agent Zero and the broader open-source ecosystem.

Name		Name	Last commit message	Last commit date
Latest commit History 1,393 Commits
.github		.github
.vscode		.vscode
agents		agents
conf		conf
docker		docker
docs		docs
knowledge		knowledge
lib/browser		lib/browser
logs		logs
prompts		prompts
python		python
skills		skills
tests		tests
tmp		tmp
usr		usr
webui		webui
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DockerfileLocal		DockerfileLocal
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
agent.py		agent.py
initialize.py		initialize.py
jsconfig.json		jsconfig.json
models.py		models.py
preload.py		preload.py
prepare.py		prepare.py
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt
requirements2.txt		requirements2.txt
run_tunnel.py		run_tunnel.py
run_ui.py		run_ui.py
update_reqs.py		update_reqs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pointer

1. What Pointer Is and What It Does

2. Use Cases and Demos

3. Core Features

4. Installation and Deployment

4.1 Requirements

4.2 Install dependencies

4.3 OS notes

4.4 API keys and models

4.5 Work directory (workdir)

4.6 Annotation service (required for Computer)

4.7 Start the Web UI

5. Roadmap

6. Important warning

7. Contact

License and acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Region	`api_base`
China (Beijing)	`https://dashscope.aliyuncs.com/compatible-mode/v1`
International	`https://dashscope-intl.aliyuncs.com/compatible-mode/v1`

Folders and files

Latest commit

History

Repository files navigation

Pointer

1. What Pointer Is and What It Does

2. Use Cases and Demos

3. Core Features

4. Installation and Deployment

4.1 Requirements

4.2 Install dependencies

4.3 OS notes

4.4 API keys and models

4.5 Work directory (workdir)

4.6 Annotation service (required for Computer)

4.7 Start the Web UI

5. Roadmap

6. Important warning

7. Contact

License and acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages