A Chrome extension + MCP server for natural-language browser automation
Built by Poke — the AI assistant from interaction.co — together with Leo Kök.
navigate_to— Open URLs and wait for load completion on the chosen tab.find_element— Locate elements by CSS, text, ARIA, or XPath; queries respect open shadow roots (and the same-document tree you’d expect for complex widgets and “portal-style” UI mounted in the page).click_element— Click by selector or viewport coordinates; always hovers ~1s before the click so hover menus and delayed affordances can appear.type_text— Type into inputs and contenteditable regions; optionalclear(default true) wipes existing text before typing, or setclear: falseto append.get_dom_snapshot— Compact DOM tree with tags, roles, labels, bounds, and interactivity hints.capture_and_upload_screenshot— Capture the visible tab and POST it to your upload endpoint (or fall back to inline base64 when upload isn’t configured).get_accessibility_tree— Semantic nodes in reading order for screen-reader–style reasoning.scroll_window— Scroll by position, delta, direction, or “scroll into view” for a selector.managetabs— List, open, close, and switch tabs in the connected Chrome profile.browser_guide— In-repo Markdown playbook: every tool, common flows, and troubleshooting.
After every click_element call, inspect the page again with get_dom_snapshot (or related tools) before the next action. Clicks often open modals, slide-overs, or rerendered regions; a fresh snapshot keeps the model aligned with what the user actually sees.
-
MCP server (npm)
Run the published launcher (recommended):npx poke-browser@latest
Or install globally / add as a dependency:
npm install poke-browser. From a local checkout undermcp-server/, usenpm install,npm run build, andnpm start. Shared launcher flags:-h/--help,-v/--version,-y/--yes, and-n/--name <label>. See TESTING.md for ports, env vars, and the inspector. -
Chrome extension
Openchrome://extensions, enable Developer mode, Load unpacked, and select this repository’s root folder — the directory that containsmanifest.json(the extension assets live alongside the manifest, not in a separateextension/directory). -
Connect
Start the MCP server, load the extension, and align WebSocket port (and optional auth token) between the popup andPOKE_BROWSER_WS_PORT/POKE_BROWSER_TOKEN.
MIT — see mcp-server/package.json.
- TESTING.md — inspector payloads, WebSocket examples, troubleshooting.
- mcp-server/README.md — MCP server specifics and dev commands.