Skip to content

leoakok/poke-browser

Repository files navigation

poke-browser — Chrome Browser MCP

A Chrome extension + MCP server for natural-language browser automation

Built by Poke — the AI assistant from interaction.co — together with Leo Kök.

poke-browser on GitHub

Key features (MCP tools)

  • navigate_to — Open URLs and wait for load completion on the chosen tab.
  • find_element — Locate elements by CSS, text, ARIA, or XPath; queries respect open shadow roots (and the same-document tree you’d expect for complex widgets and “portal-style” UI mounted in the page).
  • click_element — Click by selector or viewport coordinates; always hovers ~1s before the click so hover menus and delayed affordances can appear.
  • type_text — Type into inputs and contenteditable regions; optional clear (default true) wipes existing text before typing, or set clear: false to append.
  • get_dom_snapshot — Compact DOM tree with tags, roles, labels, bounds, and interactivity hints.
  • capture_and_upload_screenshot — Capture the visible tab and POST it to your upload endpoint (or fall back to inline base64 when upload isn’t configured).
  • get_accessibility_tree — Semantic nodes in reading order for screen-reader–style reasoning.
  • scroll_window — Scroll by position, delta, direction, or “scroll into view” for a selector.
  • managetabs — List, open, close, and switch tabs in the connected Chrome profile.
  • browser_guide — In-repo Markdown playbook: every tool, common flows, and troubleshooting.

Snapshot-then-act

After every click_element call, inspect the page again with get_dom_snapshot (or related tools) before the next action. Clicks often open modals, slide-overs, or rerendered regions; a fresh snapshot keeps the model aligned with what the user actually sees.

Installation

  1. MCP server (npm)
    Run the published launcher (recommended):

    npx poke-browser@latest

    Or install globally / add as a dependency: npm install poke-browser. From a local checkout under mcp-server/, use npm install, npm run build, and npm start. Shared launcher flags: -h/--help, -v/--version, -y/--yes, and -n/--name <label>. See TESTING.md for ports, env vars, and the inspector.

  2. Chrome extension
    Open chrome://extensions, enable Developer mode, Load unpacked, and select this repository’s root folder — the directory that contains manifest.json (the extension assets live alongside the manifest, not in a separate extension/ directory).

  3. Connect
    Start the MCP server, load the extension, and align WebSocket port (and optional auth token) between the popup and POKE_BROWSER_WS_PORT / POKE_BROWSER_TOKEN.

License

MIT — see mcp-server/package.json.

Documentation

About

Browser MCP for Poke — automate Chrome with natural language via the Model Context Protocol

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors