Tools#

gptme’s tools enable AI agents to execute code, edit files, browse the web, process images, and interact with your computer.

Overview#

📁 File System#

Read - Read files in any format
Save - Create and overwrite files
Patch - Apply precise changes to existing files

💻 Code & Development#

Python - Execute Python code interactively with full library access
Shell - Run shell commands and manage system processes

🌐 Web & Research#

Browser - Browse websites, take screenshots, and read web content
RAG - Index and search through documentation and codebases
Chats - Search past conversations for context and references

👁️ Visual & Interactive#

Vision - Analyze images, diagrams, and visual content
Screenshot - Capture your screen for visual context
Computer - Control desktop applications through visual interface

⚡ Advanced Workflows#

Tmux - Manage long-running processes in terminal sessions
Subagent - Delegate subtasks to specialized agent instances

Combinations#

The real power emerges when tools work together:

Web Research + Code: Browser + Python - Browse documentation and implement solutions
Visual Development: Vision + Patch - Analyze UI mockups and update code accordingly
System Automation: Shell + Python - Combine system commands with data processing
Interactive Debugging: Screenshot + Computer - Visual debugging and interface automation
Knowledge-Driven Development: RAG + Chats - Learn from documentation and past conversations

Shell#

The assistant can execute shell commands with bash by outputting code blocks with shell as the language.

Configuration:

GPTME_SHELL_TIMEOUT: Environment variable to configure command timeout (set before starting gptme)

Set to a number (e.g., 30) for timeout in seconds
Set to 0 to disable timeout
Invalid values default to 1200 seconds (20 minutes)
If not set, defaults to 1200 seconds (20 minutes)

Instructions

The given command will be executed in a stateful bash shell.
The shell tool will respond with the output of the execution.

These programs are available, among others:
- apt-get
- docker
- git
- hyperfine
- pandoc

### Background Jobs

For long-running commands (dev servers, builds, etc.), use background jobs:
- `bg <command>` - Start command in background, returns job ID
- `jobs` - List all background jobs with status
- `output <id>` - Show accumulated output from a job
- `kill <id>` - Terminate a background job

This prevents blocking on commands like `npm run dev` that run indefinitely.

Examples

User	list the current directory
Assistant	To list the files in the current directory, use `ls`: shell ls
System	Ran command: `ls` `stdout file1.txt file2.txt `

The assistant can learn context by exploring the filesystem

User	learn about the project
Assistant	Lets start by checking the files shell git ls-files
System	`stdout README.md main.py `
Assistant	Now lets check the README shell cat README.md
System	`stdout (contents of README.md) `
Assistant	Now we check main.py shell cat main.py
System	`stdout (contents of main.py) `
Assistant	The project is...

Create vue project

User

Create a new vue project with typescript and pinia named fancy-project

Assistant

Sure! Let's create a new vue project with TypeScript and Pinia named fancy-project:shell
npm init vue@latest fancy-project --yes -- --typescript --pinia

System

`stdout
> npx
> create-vue

Vue.js - The Progressive JavaScript Framework

Scaffolding project in ./fancy-project...
`

Proper quoting for complex content

User

add a comment with backticks and special characters

Assistant

When passing complex content with special characters, use single quotes to prevent shell interpretation:shell
echo 'Content with `backticks` and $variables that should not be interpreted' > example.txt

Background jobs for long-running commands

User

start a dev server

Assistant

I'll start the dev server as a background job so it doesn't block:shell
bg npm run dev

System

Started background job **#1**: `npm run dev`
>
> Use these commands to manage it:
> - `jobs` - List all background jobs
> - `output 1` - Show output from job #1
> - `kill 1` - Terminate job #1

User

check the server output

Assistant

Let me check the output from the background job:shell
output 1

System

**Job #1** - Running (45.2s)
Command: `npm run dev`
>
> ```stdout
> > [email protected] dev
> > vite
>
> VITE v5.0.0  ready in 300 ms
>
> ➜  Local:   http://localhost:5173/
> ```

User	stop the server
Assistant	I'll terminate the background job: shell kill 1
System	Terminated job #1: `npm run dev`

gptme.tools.shell.close_conversation_shell(conversation_id: str) → None

Close and remove the shell session for a conversation.

Called by the SESSION_END hook to clean up shell file descriptors when a conversation’s last session is removed.

gptme.tools.shell.execute_shell(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → Generator[Message, None, None]: Executes a shell command and returns the output.

gptme.tools.shell.execute_shell_impl(cmd: str, logdir: Path | None, timeout: float | None = None) → Generator[Message, None, None]: Execute shell command and format output.

gptme.tools.shell.get_shell() → ShellSession

Get the shell session for the current context, creating it if necessary.

Uses ContextVar to provide context-local state, allowing each conversation to have its own shell session with independent working directory.

In server contexts (where current_conversation_id is set), also registers the shell in a conversation-level registry for cleanup via SESSION_END hooks.

gptme.tools.shell.get_shell_command(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → str: Get the shell command from code/args/kwargs.

gptme.tools.shell.get_workspace_cwd() → str | None: Get the workspace directory for the current context, if set.

gptme.tools.shell.preview_shell(cmd: str, _: Path | None) → str: Prepare preview for shell command.

gptme.tools.shell.set_shell(shell: ShellSession) → None: Set the shell session for the current context (for testing).

gptme.tools.shell.set_workspace_cwd(cwd: str) → None

Set the workspace directory for the current context (thread-safe).

Call this before any shell creation to ensure the shell subprocess starts in the correct directory, even with concurrent sessions. This is the thread-safe replacement for os.chdir() in server contexts.

gptme.tools.shell.strip_ansi_codes(text: str) → str: Strip ANSI escape sequences from text.

Python#

The assistant can execute Python code blocks.

It uses IPython to do so, and persists the IPython instance between calls to give a REPL-like experience.

Instructions

Use this tool to execute Python code in an interactive IPython session.
It will respond with the output and result of the execution.

Examples

Result of the last expression will be returned

User	What is 2 + 2?
Assistant	ipython 2 + 2
System	Executed code block. `result 4 `

Write a function and call it

User

compute fib 10

Assistant

To compute the 10th Fibonacci number, we can run the following code:ipython
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)
fib(10)

System

Executed code block.`result
55
`

class gptme.tools.python.TeeIO

__init__(original_stream)

write(s)

Write string to file.

Returns the number of characters written, which is always equal to the length of the string.

gptme.tools.python.execute_python(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → Generator[Message, None, None]: Executes a python codeblock and returns the output.

gptme.tools.python.get_installed_python_libraries() → list[str]: Check if a select list of Python libraries are installed.

gptme.tools.python.register_function(func: T) → T: Decorator to register a function to be available in the IPython instance.

Tmux#

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session. Requires tmux to be installed.

This tool is suitable to run long-running commands or interactive applications that require user input. Examples of such commands: npm run dev, python3 server.py, python3 train.py, etc. It allows for inspecting pane contents and sending input.

Instructions

You can use the tmux tool to run long-lived and/or interactive applications in a tmux session.

This tool is suitable to run long-running commands or interactive applications that require user input.
Examples of such commands are: `npm run dev`, `npm create vue@latest`, `python3 server.py`, `python3 train.py`, etc.

Available commands:
- new-session <command>: Start a new tmux session with the given command
- send-keys <session_id> <keys> [<keys>]: Send keys to the specified session
- inspect-pane <session_id>: Show the current content of the specified pane
- wait <session_id> [timeout] [stable_time]: Wait for output to stabilize (default: 60s timeout, 3s stable)
- kill-session <session_id>: Terminate the specified tmux session
- list-sessions: Show all active tmux sessions

Examples

Running subagents

User

start subagent to fix lints in parallel

Assistant

Let's start a subagent in a new tmux session:tmux
new-session gptme --non-interactive "fix lint 1"
new-session gptme --non-interactive "fix lint 2"

Running specific agent

User

Ask Bob about his latest work

Assistant

Sure! Let's start a tmux session running Bob (~/bob/):tmux
new-session cd ~/bob && gptme --non-interactive 'What is your latest work?'

Managing a dev server

User	Start the dev server
Assistant	Certainly! To start the dev server we should use tmux: tmux new-session npm run dev
System	Running `npm run dev` in session gptme_1

User

Can you show me the current content of the pane?

Assistant

Of course! Let's inspect the pane content:tmux
inspect-pane gptme_1

System

`output
Server is running on localhost:5600
`

User	Stop the dev server
Assistant	I'll send 'Ctrl+C' to the pane to stop the server: tmux send-keys 0 C-c
System	Sent 'C-c' to pane 0

Get info from ncurses applications

User	start top and give me a summary
Assistant	Sure! Let's start the top command in a tmux session: tmux new-session top
System	Running `top` in session gptme_1. `output (output from top shown here) `
Assistant	The load is...

Send keys to a session

User	start ipython
Assistant	Let's start an ipython session: tmux new-session ipython
System	Running `ipython` in session 2. `output (output from ipython shown here) `

User	Run 'print("Hello, world!")' in the ipython session tmux send-keys 2 'print("Hello, world!")' Enter

Listing active sessions

User	List all active tmux sessions tmux list-sessions
System	Active tmux sessions ['0', 'gptme_1']

gptme.tools.tmux.execute_tmux(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → Generator[Message, None, None]: Executes a command in tmux and returns the output.

gptme.tools.tmux.inspect_pane(pane_id: str, logdir: Path | None = None) → Message

Inspect the content of a tmux pane.

Parameters:

pane_id – The tmux pane ID to inspect
logdir – Optional directory to save full output if truncated

Returns:

Message with pane content (truncated if too long)

gptme.tools.tmux.wait_for_output(session_id: str, timeout: int = 60, stable_time: int = 3, logdir: Path | None = None) → Message

Wait for command output to stabilize in a tmux session.

Monitors the pane output and waits until it remains unchanged for stable_time seconds, or until timeout is reached.

Parameters:

session_id – The tmux session ID to monitor
timeout – Maximum time to wait in seconds (default: 60)
stable_time – Seconds of unchanged output to consider stable (default: 3)

Returns:

Message with the final output and status

Subagent#

Subagent tool — spawn, monitor, and coordinate child agents.

Extracted from a single 1100-line module into a package for maintainability.

Package structure: - types.py — Data classes and module-level state (Subagent, ReturnType, etc.) - hooks.py — Completion notification system (LOOP_CONTINUE hook) - api.py — Public API (subagent, subagent_status, subagent_wait, etc.) - batch.py — Batch execution (BatchJob, subagent_batch) - execution.py — Execution backends (thread, subprocess, process monitoring)

Instructions

You can create, check status, wait for, and read logs from subagents.

Subagents support a "fire-and-forget-then-get-alerted" pattern:
- Call subagent() to start an async task (returns immediately)
- Continue with other work
- Receive completion messages via the LOOP_CONTINUE hook
- Optionally use subagent_wait() for explicit synchronization

Key features:
- Agent profiles: Use profile names as agent_id for automatic profile detection
- model="provider/model": Override parent's model (route cheap tasks to faster models)
- use_subprocess=True: Run subagent in subprocess for output isolation
- use_acp=True: Run subagent via ACP protocol (supports any ACP-compatible agent)
- acp_command="claude-code-acp": Use a different ACP agent (default: gptme-acp)
- isolated=True: Run subagent in a git worktree for filesystem isolation
- subagent_batch(): Start multiple subagents in parallel
- Hook-based notifications: Completions delivered as system messages

## Agent Profiles for Subagents

Use profiles to create specialized subagents with appropriate capabilities.
When agent_id matches a profile name, the profile is auto-applied:
- explorer: Read-only analysis (tools: read)
- researcher: Web research without file modification (tools: browser, read)
- developer: Full development capabilities (all tools)
- isolated: Restricted processing for untrusted content (tools: read, ipython)
- computer-use: Visual UI testing specialist (tools: computer, vision, ipython, shell)
- browser-use: Web interaction and testing specialist (tools: browser, screenshot, vision, shell) — supports interactive browsing (open_page, click, fill, scroll) and one-shot reads

Example: `subagent("explorer", "Explore codebase")`
With model override: `subagent("researcher", "Find docs", model="openai/gpt-4o-mini")`
Computer-use example: `subagent("computer-use", "Click the Submit button, wait for the modal, and screenshot the result")`
Browser-use example: `subagent("browser-use", "Open localhost:5173, fill the chat input, click send, and report the result")`

Use subagent_read_log() to inspect a subagent's conversation log for debugging.

## Structured Delegation Template

For complex delegations, use this 7-section template for clear task handoff:

TASK: [What the subagent should do]
EXPECTED OUTCOME: [Specific deliverable - format, structure, quality bars]
REQUIRED SKILLS: [What capabilities the subagent needs]
REQUIRED TOOLS: [Specific tools the subagent should use]
MUST DO: [Non-negotiable requirements]
MUST NOT DO: [Explicit constraints and forbidden actions]
CONTEXT: [Background info, dependencies, related work]

Example prompt using the template:
'''
TASK: Implement the user authentication feature
EXPECTED OUTCOME: auth.py with login/logout endpoints, passing tests
REQUIRED SKILLS: Python, FastAPI, JWT tokens
REQUIRED TOOLS: save, shell (for pytest)
MUST DO: Use bcrypt for password hashing, return proper HTTP status codes
MUST NOT DO: Store plaintext passwords, skip input validation
CONTEXT: This is for the gptme server API, see existing endpoints in server.py
'''

Examples

Executor Mode (single task)

User	compute fib 13 using a subagent
Assistant	Starting a subagent to compute the 13th Fibonacci number. ipython subagent("fib-13", "compute the 13th Fibonacci number")
System	Subagent started successfully.
Assistant	Now we need to wait for the subagent to finish the task. ipython subagent_wait("fib-13")
System	{"status": "success", "result": "The 13th Fibonacci number is 233"}.

Planner Mode (multi-task delegation)

User	implement feature X with tests
Assistant	I'll use planner mode to delegate implementation and testing to separate subagents. ipython subtasks = [ {{"id": "implement", "description": "Write implementation for feature X"}}, {{"id": "test", "description": "Write comprehensive tests"}}, ] subagent("feature-planner", "Feature X adds new functionality", mode="planner", subtasks=subtasks)
System	Planner spawned 2 executor subagents.
Assistant	Now I'll wait for both subtasks to complete. ipython subagent_wait("feature-planner-implement")
System	{"status": "success", "result": "Implementation complete in feature_x.py"}. ipython subagent_wait("feature-planner-test")
System	{"status": "success", "result": "Tests complete in test_feature_x.py, all passing"}. ### Context Modes

Full Context (default)

User

analyze this codebase

Assistant

I'll use full context mode for comprehensive analysis.ipython
subagent("analyze", "Analyze code quality and suggest improvements", context_mode="full")

Selective Context (choose specific components)

User

write tests using pytest

Assistant

I'll use selective mode to share only project files, not context_cmd output.ipython
subagent("tests", "Write pytest tests for the calculate function", context_mode="selective", context_include=["files"])

Subprocess Mode (output isolation)

User

run a subagent without output mixing with parent

Assistant

I'll use subprocess mode for better output isolation.ipython
subagent("isolated", "Compute complex calculation", use_subprocess=True)

System

Subagent started in subprocess mode.

ACP Mode (multi-harness support)

User

delegate this task to a Claude Code agent

Assistant

I'll use ACP mode to run this via a different agent harness.ipython
subagent("claude-task", "Analyze and refactor the auth module", use_acp=True, acp_command="claude-code-acp")

System

Started subagent "claude-task" in ACP mode.

Batch Execution (parallel tasks)

User

implement, test, and document a feature in parallel

Assistant

I'll use subagent_batch for parallel execution with fire-and-gather pattern.ipython
job = subagent_batch([
    ("impl", "Implement the user authentication feature"),
    ("test", "Write tests for authentication"),
    ("docs", "Document the authentication API"),
])

# Do other work while subagents run...

results = job.wait_all(timeout=300)
for agent_id, result in results.items():
    print(f"{{agent_id}}: {{result['status']}}")

System

Started batch of 3 subagents: ['impl', 'test', 'docs']
impl: success
test: success
docs: success

Fire-and-Forget with Hook Notifications

User	start a subagent and continue working
Assistant	I'll spawn a subagent. Completion will be delivered via the LOOP_CONTINUE hook. ipython subagent("compute-demo", "Compute pi to 100 digits") # I can continue with other work now # When the subagent completes, I'll receive a system message like: # "✅ Subagent 'compute-demo' completed: pi = 3.14159..."
System	Started subagent "compute-demo"
System	✅ Subagent 'compute-demo' completed: pi = 3.14159265358979...

Profile-Based Subagents (auto-detected from agent_id)

User

explore this codebase and summarize the architecture

Assistant

I'll use the explorer profile for a read-only analysis.ipython
subagent("explorer", "Analyze the codebase architecture and summarize key patterns")

System

Subagent started successfully.

Profile with Model Override

User

research best practices for error handling

Assistant

I'll spawn a researcher subagent with a faster model for web research.ipython
subagent("researcher", "Research error handling best practices in Python", model="openai/gpt-4o-mini")

System

Subagent started successfully.

Structured Delegation Template

User

implement a robust auth feature

Assistant

I'll use the structured delegation template for clear task handoff.ipython
subagent("auth-impl", "TASK: Implement JWT auth | OUTCOME: auth.py with tests | MUST: bcrypt, validation | MUST NOT: plaintext passwords")

System

Subagent started successfully.

Isolated Subagent (Worktree)

User

implement a feature without affecting my working directory

Assistant

I'll run the subagent in an isolated git worktree so it won't modify your files.ipython
subagent("feature-impl", "Implement the new caching layer in cache.py", isolated=True)

System

Subagent started successfully.

class gptme.tools.subagent.BatchJob

Manages a batch of subagents for parallel execution.

Note: With the hook-based notification system, the orchestrator will receive completion messages automatically via the LOOP_CONTINUE hook. This class provides additional utilities for explicit synchronization when needed.

__init__(agent_ids: list[str], results: dict[str, ~gptme.tools.subagent.types.ReturnType] = <factory>, _lock: ~_thread.allocate_lock = <factory>) → None

get_completed() → dict[str, dict]: Get results of completed subagents so far.

is_complete() → bool: Check if all subagents have completed.

wait_all(timeout: int = 300) → dict[str, dict]

Wait for all subagents to complete.

Parameters:: timeout – Maximum seconds to wait for all subagents
Returns:: Dict mapping agent_id to status dict

class gptme.tools.subagent.ReturnType

ReturnType(status: Literal[‘running’, ‘success’, ‘failure’], result: str | None = None)

__init__(status: Literal['running', 'success', 'failure'], result: str | None = None) → None

class gptme.tools.subagent.Subagent

Represents a running or completed subagent.

Supports both thread-based (default) and subprocess-based execution modes. Subprocess mode provides better output isolation.

Communication Model (Phase 1):

One-way: Parent sends prompt, child executes independently
No runtime updates from child to parent
Results retrieved after completion via status()/subagent_wait()

Future (Phase 2/3):

Support for progress notifications from child → parent
Clarification requests when child encounters ambiguity
See module docstring for full design intent

__init__(agent_id: str, prompt: str, thread: Thread | None, logdir: Path, model: str | None, output_schema: type | None = None, process: Popen | None = None, execution_mode: Literal['thread', 'subprocess', 'acp'] = 'thread', acp_command: str | None = None, isolated: bool = False, worktree_path: Path | None = None, repo_path: Path | None = None) → None

is_running() → bool: Check if the subagent is still running.

class gptme.tools.subagent.SubtaskDef: Definition of a subtask for planner mode.

gptme.tools.subagent.notify_completion(agent_id: str, status: Literal['running', 'success', 'failure'], summary: str) → None

Add a subagent completion to the notification queue.

Called by the monitor thread when a subagent finishes. The queued notification will be delivered via the subagent_completion hook during the next LOOP_CONTINUE cycle.

Parameters:

agent_id – The subagent’s identifier
status – “success” or “failure”
summary – Brief summary of the result

gptme.tools.subagent.subagent(agent_id: str, prompt: str, mode: Literal['executor', 'planner'] = 'executor', subtasks: list[SubtaskDef] | None = None, execution_mode: Literal['parallel', 'sequential'] = 'parallel', context_mode: Literal['full', 'selective'] = 'full', context_include: list[str] | None = None, output_schema: type | None = None, use_subprocess: bool = False, use_acp: bool = False, acp_command: str = 'gptme-acp', profile: str | None = None, model: str | None = None, isolated: bool = False)

Starts an asynchronous subagent. Returns None immediately.

Subagent completions are delivered via the LOOP_CONTINUE hook, enabling a “fire-and-forget-then-get-alerted” pattern where the orchestrator can continue working and get notified when subagents finish.

Profile auto-detection: If agent_id matches a known profile name (e.g. “explorer”, “researcher”, “developer”) or a common role alias (“explore”→”explorer”, “research”→”researcher”, “impl”/”dev”→”developer”), the profile is applied automatically — no need to pass profile separately.

Parameters:

agent_id – Unique identifier for the subagent. If it matches a known profile name (or a common alias like impl/dev), that profile is auto-applied (unless profile is explicitly set to something else).
prompt – Task prompt for the subagent (used as context for planner mode)
mode – “executor” for single task, “planner” for delegating to multiple executors
subtasks – List of subtask definitions for planner mode (required when mode=”planner”)
execution_mode – “parallel” (default) runs all subtasks concurrently, “sequential” runs subtasks one after another. Only applies to planner mode.
context_mode – Controls what context is shared with the subagent: - “full” (default): Share complete context (agent identity, tools, workspace) - “selective”: Share only specified context components (requires context_include)
context_include – For selective mode, list of context components to include: - “files”: Project config files (gptme.toml files list) - “cmd”: Dynamic context_cmd output - “all”: Include both files and cmd Note: Tools and agent identity are always included by the CLI.
use_subprocess – If True, run subagent in subprocess for output isolation. Subprocess mode captures stdout/stderr separately from the parent.
use_acp – If True, run subagent via ACP (Agent Client Protocol). This enables multi-harness support — the subagent can be any ACP-compatible agent (gptme, Claude Code, Cursor, etc.). Requires the acp package: pip install ‘gptme[acp]’.
acp_command – ACP agent command to invoke (default: “gptme-acp”). Only used when use_acp=True. Can be any ACP-compatible CLI.
profile – Agent profile name to apply. Profiles provide: - System prompt customization (behavioral hints) - Tool access restrictions (which tools the subagent can use) - Behavior rules (read-only, no-network, etc.) Use ‘gptme-util profile list’ to see available profiles. Built-in profiles: default, explorer, researcher, developer, isolated, computer-use, browser-use. If not set, auto-detected from agent_id when it matches a profile name.
model – Model to use for the subagent. Overrides parent’s model. Useful for routing cheap tasks to faster/cheaper models.
isolated – If True, run the subagent in a git worktree for filesystem isolation. The subagent gets its own copy of the repository and can modify files without affecting the parent. The worktree is automatically cleaned up after the subagent completes. Falls back to a temporary directory if not in a git repo.

Returns:

Starts asynchronous execution.

In executor mode, starts a single task execution. In planner mode, starts execution of all subtasks using the specified execution_mode.

Executors use the complete tool to signal completion with a summary. The full conversation log is available at the logdir path.

Return type:

None

gptme.tools.subagent.subagent_batch(tasks: list[tuple[str, str]], use_subprocess: bool = False, use_acp: bool = False, acp_command: str = 'gptme-acp') → BatchJob

Start multiple subagents in parallel and return a BatchJob to manage them.

This is a convenience function for fire-and-gather patterns where you want to run multiple independent tasks concurrently.

With the hook-based notification system, completion messages are delivered automatically via the LOOP_CONTINUE hook. The BatchJob provides additional utilities for explicit synchronization when needed.

Parameters:

tasks – List of (agent_id, prompt) tuples
use_subprocess – If True, run subagents in subprocesses for output isolation
use_acp – If True, run subagents via ACP protocol
acp_command – ACP agent command (default: “gptme-acp”)

Returns:

A BatchJob instance for managing the parallel subagents. The BatchJob provides wait_all(timeout) to wait for completion, is_complete() to check status, and get_completed() for partial results.

Example:

job = subagent_batch([
    ("impl", "Implement feature X"),
    ("test", "Write tests for feature X"),
    ("docs", "Document feature X"),
])
# Orchestrator continues with other work...
# Completion messages delivered via LOOP_CONTINUE hook:
#   "✅ Subagent 'impl' completed: Feature implemented"
#   "✅ Subagent 'test' completed: 5 tests added"
#
# Or explicitly wait for all if needed:
results = job.wait_all(timeout=300)

gptme.tools.subagent.subagent_read_log(agent_id: str, max_messages: int = 50, include_system: bool = False, message_filter: str | None = None) → str

Read the conversation log of a subagent.

Parameters:

agent_id – The subagent to read logs from
max_messages – Maximum number of messages to return
include_system – Whether to include system messages
message_filter – Filter messages by role (user/assistant/system) or None for all

Returns:

Formatted log output showing the conversation

gptme.tools.subagent.subagent_status(agent_id: str) → dict: Returns the status of a subagent.

gptme.tools.subagent.subagent_wait(agent_id: str, timeout: int = 60) → dict

Waits for a subagent to finish.

Parameters:

agent_id – The subagent to wait for
timeout – Maximum seconds to wait (default 60)

Returns:

Status dict with ‘status’ and ‘result’ keys

Read#

Read the contents of a file or list the contents of a directory.

Provides a sandboxed file reading capability that works without shell access. Useful for restricted tool sets (e.g., --tools read,patch,save).

Instructions

Read the content of a file or list the contents of a directory.
The path can be relative or absolute.
For files, output includes line numbers for easy reference.
For directories, output shows a flat listing of immediate files and subdirectories.

Examples

User	read hello.py
Assistant	read hello.py
System	```hello.py > 1 print("Hello world") > 2 print("Goodbye world") > ```

gptme.tools.read.execute_read(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → Generator[Message, None, None]: Read a file and return its contents with line numbers.

Save#

Gives the assistant the ability to save whole files, or append to them.

Instructions

Create or overwrite a file with the given content.

The path can be relative to the current directory, or absolute.
If the current directory changes, the path will be relative to the new directory.

Examples

User	write a hello world script to hello.py
Assistant	save hello.py print("Hello world")
System	Saved to `hello.py`

User	make it all-caps
Assistant	save hello.py print("HELLO WORLD")
System	Saved to `hello.py`

Instructions

Append the given content to a file.`.

Examples

User	append a print "Hello world" to hello.py
Assistant	append hello.py print("Hello world")
System	Appended to `hello.py`

gptme.tools.save.check_for_placeholders(content: str) → bool: Check if content contains placeholder lines.

gptme.tools.save.execute_append(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → Generator[Message, None, None]: Append code to a file.

gptme.tools.save.execute_append_impl(content: str, path: Path | None) → Generator[Message, None, None]: Actual append implementation.

gptme.tools.save.execute_save(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → Generator[Message, None, None]: Save code to a file.

gptme.tools.save.execute_save_impl(content: str, path: Path | None) → Generator[Message, None, None]: Actual save implementation.

gptme.tools.save.preview_append(content: str, path: Path | None) → str | None: Prepare preview content for append operation.

gptme.tools.save.preview_save(content: str, path: Path | None) → str | None: Prepare preview content for save operation.

Patch#

Gives the LLM agent the ability to patch text files, by using a adapted version git conflict markers.

Environment Variables:

GPTME_PATCH_RECOVERY: If set to “true” or “1”, returns the file content in error messages: when patches don’t match. This helps the assistant recover faster by seeing the actual file contents.

Instructions

To patch/modify files, we use an adapted version of git conflict markers.

This can be used to edit files, without having to rewrite the whole file.
Multiple ORIGINAL/UPDATED blocks can be included in a single patch to make several changes at once.
Try to keep each patch as small as possible. Avoid placeholders, as they may make the patch fail.

To keep patches small, try to scope each change to imports/function/class.
If the total patch is large, consider using the save tool to rewrite the whole file.

Note: When patching markdown files, avoid replacing partial codeblocks (e.g., just the opening
or closing backticks). The patch content is parsed as nested markdown, which requires complete
codeblocks. For simple codeblock boundary changes (like modifying a language tag), use shell
commands like `sed` or `perl` instead.

Examples

User

patch `src/hello.py` to ask for the name of the usersrc/hello.py
def hello():
    print("Hello world")

if __name__ == "__main__":
    hello()

Assistant

patch src/hello.py
<<<<<<< ORIGINAL
    print("Hello world")
=======
    name = input("What is your name? ")
    print(f"Hello {name}")
>>>>>>> UPDATED

System

Patch applied

class gptme.tools.patch.Patch

Patch(original: str, updated: str)

__init__(original: str, updated: str) → None

diff_minimal(strip_context=False) → str: Show a minimal diff of the patch. Note that a minimal diff isn’t necessarily a unique diff.

gptme.tools.patch.apply(codeblock: str, content: str) → str: Applies multiple patches in codeblock to content. Provides detailed error messages when patches fail.

gptme.tools.patch.execute_patch(code: str | None, args: list[str] | None, kwargs: dict[str, str] | None) → Generator[Message, None, None]: Applies the patch.

gptme.tools.patch.execute_patch_impl(content: str, path: Path | None) → Generator[Message, None, None]: Actual patch implementation.

gptme.tools.patch.preview_patch(content: str, path: Path | None) → str | None: Prepare preview content for patch operation.

Vision#

Tools for viewing images, giving the assistant vision.

Requires a model which supports vision, such as GPT-4o, Anthropic, and Llama 3.2.

gptme.tools.vision.view_image(image_path: Path | str | Image) → Message: View an image. Large images (>1MB) will be automatically scaled down.

Screenshot#

A simple screenshot tool, using screencapture on macOS and scrot or gnome-screenshot on Linux.

gptme.tools.screenshot.screenshot(path: Path | None = None) → Path: Take a screenshot and save it to a file.

Browser#

Tools to let the assistant control a browser, including:

loading pages
reading their contents
searching the web
taking screenshots (Playwright only)
getting ARIA accessibility snapshots (Playwright only)
interactive browsing: click, fill forms, scroll (Playwright only)
reading PDFs (with page limits and vision fallback hints)
converting PDFs to images (using pdftoppm, ImageMagick, or vips)

Two backends are available:

Playwright backend:

Full browser automation with screenshots

Installation:

pipx install 'gptme[browser]'
# We need to use the same version of Playwright as the one installed by gptme
# when downloading the browser binaries. gptme will attempt this automatically
PW_VERSION=$(pipx runpip gptme show playwright | grep Version | cut -d' ' -f2)
pipx run playwright==$PW_VERSION install chromium-headless-shell

Lynx backend:

Text-only browser for basic page reading and searching
No screenshot support

Installation:

# On Ubuntu
sudo apt install lynx
# On macOS
brew install lynx
# or any other way that gets you the `lynx` command

Provider Native Search:

When using Anthropic Claude models, native web search can be enabled
This uses Anthropic’s built-in web search instead of web scraping
More reliable than Google/DuckDuckGo scraping (which is blocked by bot detection)

Configuration:

export GPTME_ANTHROPIC_WEB_SEARCH=true
export GPTME_ANTHROPIC_WEB_SEARCH_MAX_USES=5  # Optional, default is 5

Note

This is an experimental feature. It needs some work to be more robust and useful.

Examples

Reading docs

User	how does gptme work?
Assistant	Let's read the docs. ipython read_url('https://gptme.org/docs/')

Answer question from URL with browsing

User	find out which is the latest ActivityWatch version from superuserlabs.org
Assistant	Let's browse the site. ipython read_url('https://superuserlabs.org/')
System	`https://superuserlabs.org/ ... [ActivityWatch](https://activitywatch.net/) ... `
Assistant	Couldn't find the answer on the page. Following link to the ActivityWatch website. ipython read_url('https://activitywatch.net/')
System	`https://activitywatch.net/ ... Download latest version v0.12.2 ... `
Assistant	The latest version of ActivityWatch is v0.12.2

Searching

User	who is the founder of ActivityWatch?
Assistant	Let's search for that. ipython search('ActivityWatch founder')
System	`result ActivityWatch was founded by Erik Bjäreholt in 2016... `
Assistant	The founder of ActivityWatch is Erik Bjäreholt.

Searching for latest information

User	what are the latest developments in AI?
Assistant	Let me search for that. ipython search('latest developments in AI')
System	`result Based on recent developments, AI has seen significant advances... `
Assistant	Based on the search results, here are the latest AI developments...

Take screenshot of page

User

take a screenshot of the ActivityWatch website

Assistant

Certainly! I'll use the browser tool to screenshot the ActivityWatch website.ipython
screenshot_url('https://activitywatch.net')

System

`result
Screenshot saved to screenshot.png
`

Get ARIA snapshot to see interactive elements

User	what interactive elements are on example.com?
Assistant	Let me get the accessibility snapshot of the page. ipython snapshot_url('https://example.com')
System	`result Page: Example Domain URL: https://example.com/ - WebArea "Example Domain": - heading "Example Domain" [level=1] - text "This domain is for use in illustrative examples..." - link "More information..." `
Assistant	The page has a heading "Example Domain", a paragraph with description text, and a link "More information...".

Interactive browsing: open page, click, fill

User	search for gptme on example.com's search form
Assistant	I'll open the page, fill the search form, and click submit. ipython open_page('https://example.com')
System	`result Page: Example URL: https://example.com/ - WebArea "Example": - textbox "Search" [name="q"] - button "Go" `
Assistant	I can see a search box and button. Let me fill in the search and click Go. ipython fill_element('input[name="q"]', 'gptme')
System	`result Page: Example URL: https://example.com/ - WebArea "Example": - textbox "Search" [name="q"]: gptme - button "Go" ` ipython click_element('text=Go')
System	`result Page: Search Results URL: https://example.com/search?q=gptme - WebArea "Search Results": - heading "Results for: gptme" - link "gptme on GitHub" `
Assistant	The search was submitted and the page now shows results for "gptme".

Read full text content of interactive page

User	what does the article say?
Assistant	Let me read the full text content of the current page. ipython read_page_text()
System	`result # Article Title The article discusses... `
Assistant	The article covers [summary of content].

Read URL and check browser logs

User	read this page and check if there are any console errors
Assistant	I'll read the page first and then check the browser logs. ipython read_url('https://example.com')
System	`https://example.com This domain is for use in illustrative examples... `
Assistant	Now let me check the browser console logs: ipython read_logs()
System	`result No logs or errors captured. `

Read PDF document

User	read this research paper from arxiv
Assistant	I'll read the PDF and extract its text content. ipython read_url('https://arxiv.org/pdf/2410.12361v2')
System	`result --- Page 1 --- [PDF text content...] --- Page 2 --- [More content...] --- Note: This PDF has 42 pages. Showing first 10 pages. To read more pages, use: `read_url('...', max_pages=N)` where N is the desired count, or 0 for all pages. Tip: If this text extraction seems incomplete or garbled (common with scanned documents, complex layouts, or image-heavy PDFs), try vision-based reading: convert pages to images using a PDF-to-image tool, then use the vision tool to analyze them. `
Assistant	I've extracted the text from the PDF. The paper discusses [summary of key points]...

gptme.tools.browser.click_element(selector: str) → str

Click an element on the current page and return updated ARIA snapshot.

Requires open_page() to be called first.

Parameters:: selector – Playwright selector to find the element. Supports: - CSS: “#submit-btn”, “.nav-link”, “button” - Text: “text=Submit”, “text=Log in” - Role: “role=button[name=’Submit’]” - Chained: “form >> text=Submit”

gptme.tools.browser.close_page() → str

Close the current interactive browsing page.

Frees browser resources. A new page can be opened with open_page().

gptme.tools.browser.fill_element(selector: str, value: str) → str

Fill a form field on the current page and return updated ARIA snapshot.

Requires open_page() to be called first. Clears any existing value before filling.

Parameters:

selector – Playwright selector for the input/textarea element.
value – Text to fill into the field.

gptme.tools.browser.has_lynx() → bool: Check if lynx is available.

gptme.tools.browser.has_playwright() → bool: Check if playwright is available.

gptme.tools.browser.open_page(url: str) → str

Open a page for interactive browsing. Returns ARIA accessibility snapshot.

Use this instead of read_url() when you need to interact with the page (click buttons, fill forms, scroll). The page stays open for subsequent click_element(), fill_element(), and scroll_page() calls.

The output includes a metadata header with the page title and current URL.

gptme.tools.browser.pdf_to_images(url_or_path: str, output_dir: str | Path | None = None, pages: tuple[int, int] | None = None, dpi: int = 150) → list[Path]

Convert PDF pages to images using auto-detected CLI tools.

Auto-detects and uses the first available tool: pdftoppm, ImageMagick convert, or vips.

Parameters:

url_or_path – URL or local path to PDF file
output_dir – Directory to save images (default: creates temp directory)
pages – Optional tuple of (first_page, last_page) to convert (1-indexed). If None, converts all pages.
dpi – Resolution for output images (default: 150)

Returns:

List of paths to generated PNG images

Raises:

RuntimeError – If no PDF-to-image tools are available
subprocess.CalledProcessError – If conversion fails

Example

>>> images = pdf_to_images("https://example.com/doc.pdf")
>>> for img in images:
...     view_image(img)  # Analyze with vision tool

gptme.tools.browser.read_logs() → str: Read browser console logs from the last read URL.

gptme.tools.browser.read_page_text() → str

Read the full text content of the current interactive page as Markdown.

Requires open_page() to be called first. Returns the page body converted to Markdown, preserving text formatting. Useful for reading article text, documentation, or other content after navigating to a page.

Unlike read_url(), this reads from the current interactive session — so it reflects the page state after any clicks, form fills, or navigation.

gptme.tools.browser.read_url(url: str, max_pages: int | None = None) → str

Read a webpage or PDF in a text format.

Parameters:

url – URL to read
max_pages – For PDFs only - maximum pages to read (default: 10). Set to 0 to read all pages. Ignored for web pages.

gptme.tools.browser.screenshot_url(url: str, path: Path | str | None = None) → Path: Take a screenshot of a webpage.

gptme.tools.browser.scroll_page(direction: str = 'down', amount: int = 500) → str

Scroll the current page and return updated ARIA snapshot.

Requires open_page() to be called first.

Parameters:

direction – “up” or “down” (default: “down”)
amount – Pixels to scroll (default: 500)

gptme.tools.browser.search(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] | None = None) → str

Search for a query on a search engine.

If no engine is specified, automatically chooses the best available backend and falls back to the next usable backend on failure.

gptme.tools.browser.search_playwright(query: str, engine: Literal['google', 'duckduckgo', 'perplexity'] = 'google') → str: Search for a query on a search engine using Playwright.

gptme.tools.browser.snapshot_url(url: str) → str

Get the ARIA accessibility snapshot of a webpage.

Returns a structured text representation of the page’s accessibility tree, showing interactive elements (buttons, links, inputs) with their roles and names. Useful for understanding page structure and finding elements to interact with.

The output includes a metadata header with the page title and current URL (which may differ from the requested URL after redirects).

Chats#

List, search, and summarize past conversation logs.

Examples

Search for a specific topic in past conversations

User

Can you find any mentions of "python" in our past conversations?

Assistant

Certainly! I'll search our past conversations for mentions of "python" using the search_chats function.chats
search_chats('python')

gptme.tools.chats.conversation_stats(since: str | None = None, as_json: bool = False) → None

Show statistics about conversation history.

Parameters:

since – Only include conversations since this date (YYYY-MM-DD or Nd).
as_json – Output as JSON instead of formatted text.

gptme.tools.chats.find_empty_conversations(max_messages: int = 1, include_test: bool = False) → list[dict]

Find conversations with few or no messages.

Scans all conversations and returns those with at most max_messages messages. Useful for cleaning up abandoned or empty conversation logs.

Parameters:

max_messages – Maximum message count to consider “empty” (default: 1, system-only).
include_test – Whether to include test/eval conversations.

Returns:

List of dicts with conversation metadata and disk size.

gptme.tools.chats.list_chats(max_results: int = 5, metadata=False, include_summary: bool = False) → None

List recent chat conversations and optionally summarize them using an LLM.

Parameters:

max_results (int) – Maximum number of conversations to display.
include_summary (bool) – Whether to include a summary of each conversation. If True, uses an LLM to generate a comprehensive summary. If False, uses a simple strategy showing snippets of the first and last messages.

gptme.tools.chats.read_chat(id: str, max_results: int = 5, incl_system=False) → None

Read a specific conversation log.

Parameters:

id (str) – The id of the conversation to read.
max_results (int) – Maximum number of messages to display.
incl_system (bool) – Whether to include system messages.

gptme.tools.chats.search_chats(query: str, max_results: int = 5, system=False, sort: Literal['date', 'count'] = 'date') → None

Search past conversation logs for the given query and print a summary of the results.

Parameters:

query (str) – The search query.
max_results (int) – Maximum number of conversations to display.
system (bool) – Whether to include system messages in the search.

Computer#

Warning

The computer use interface is experimental and has serious security implications. Please use with caution and see Anthropic’s documentation on computer use for additional guidance.

Tool for computer interaction for X11 or macOS environments, including screen capture, keyboard, and mouse control.

The computer tool provides direct interaction with the desktop environment. Similar to Anthropic’s computer use demo, but integrated with gptme’s architecture.

Features

Keyboard input simulation
Mouse control (movement, clicks, dragging)
Screen capture with automatic scaling
Cursor position tracking

Installation

On Linux, requires X11 and xdotool:

# On Debian/Ubuntu
sudo apt install xdotool

# On Arch Linux
sudo pacman -S xdotool

On macOS, uses native screencapture and external tool cliclick:

brew install cliclick

You need to give your terminal both screen recording and accessibility permissions in System Preferences.

Configuration

The tool uses these environment variables:

DISPLAY: X11 display to use (default: “:1”, Linux only)
WIDTH: Screen width (default: 1024)
HEIGHT: Screen height (default: 768)

Usage

The tool supports these actions:

Keyboard:

key: Send key sequence (e.g., “Return”, “Control_L+c”)
type: Type text with realistic delays

Mouse:

mouse_move: Move mouse to coordinates
left_click: Click left mouse button
right_click: Click right mouse button
middle_click: Click middle mouse button
double_click: Double click left mouse button
left_click_drag: Click and drag to coordinates

Screen:

screenshot: Take and view a screenshot
cursor_position: Get current mouse position

The tool automatically handles screen resolution scaling to ensure optimal performance with LLM vision capabilities.

Tips for Complex Operations

For complex operations involving multiple keypresses, you can use semicolon-separated sequences with key:

Examples

Filling a login form: t:username;kp:tab;t:password;kp:return
Switching applications: cmd+tab on macOS, alt+Tab on Linux
(macOS) Opening Spotlight and searching: cmd+space;t:firefox;return

Using a single sequence for complex operations ensures proper timing and recognition of keyboard shortcuts.

Instructions

You can interact with the computer through the `computer` Python function.
Works on both Linux (X11) and macOS.

The key input syntax works consistently across platforms with:

Available actions:
- key: Send key sequence using a unified syntax:
  - Type text: "t:Hello World"
  - Press key: "return", "esc", "tab"
  - Key combination: "ctrl+c", "cmd+space"
  - Chain commands: "cmd+space;t:firefox;return"
- type: Type text with realistic delays (legacy method)
- mouse_move: Move mouse to coordinates
- left_click, right_click, middle_click, double_click: Mouse clicks
- left_click_drag: Click and drag to coordinates
- screenshot: Take and view a screenshot
- cursor_position: Get current mouse position

Note: Key names are automatically mapped between platforms.
Common modifiers (ctrl, alt, cmd/super, shift) work consistently across platforms.

Examples

User	Take a screenshot of the desktop
Assistant	I'll capture the screen using the screenshot tool. ipython computer("screenshot")
System	Viewing image...

User	Type "Hello, World!" into the active window
Assistant	I'll type the text with realistic delays. ipython computer("type", text="Hello, World!")
System	Typed text: Hello, World!

User	Move the mouse to coordinates (100, 200) and click
Assistant	I'll move the mouse and perform a left click. ipython computer("mouse_move", coordinate=(100, 200))
System	Moved mouse to 100,200 ipython computer("left_click")
System	Performed left_click

User	Get the current mouse position
Assistant	I'll get the cursor position. ipython computer("cursor_position")
System	Cursor position: X=512,Y=384

User	Double-click at current position
Assistant	I'll perform a double-click. ipython computer("double_click")
System	Performed double_click

User	Open a new browser tab
Assistant	I'll open a new browser tab. ipython computer("key", text="ctrl+t")
System	Sent key sequence: ctrl+t

class gptme.tools.computer.ComboOperation

class gptme.tools.computer.KeyOperation

class gptme.tools.computer.TextOperation

gptme.tools.computer.computer(action: Literal['key', 'type', 'mouse_move', 'left_click', 'left_click_drag', 'right_click', 'middle_click', 'double_click', 'screenshot', 'cursor_position'], text: str | None = None, coordinate: tuple[int, int] | None = None) → Message | None

Perform computer interactions in X11 or macOS environments.

Parameters:

action – The type of action to perform
text – Text to type or key sequence to send
coordinate – X,Y coordinates for mouse actions

RAG#

RAG (Retrieval-Augmented Generation) tool for context-aware assistance.

The RAG tool provides context-aware assistance by indexing and semantically searching text files.

Installation

The RAG tool requires the gptme-rag CLI to be installed:

pipx install gptme-rag

Configuration

Configure RAG in your gptme.toml:

[rag]
enabled = true
post_process = false # Whether to post-process the context with an LLM to extract the most relevant information
post_process_model = "openai/gpt-4o-mini" # Which model to use for post-processing
post_process_prompt = "" # Optional prompt to use for post-processing (overrides default prompt)
workspace_only = true # Whether to only search in the workspace directory, or the whole RAG index
paths = [] # List of paths to include in the RAG index. Has no effect if workspace_only is true.

Features

Manual Search and Indexing
- Index project documentation with rag_index
- Search indexed documents with rag_search
- Check index status with rag_status
Automatic Context Enhancement
- Retrieves semantically similar documents
- Preserves conversation flow with hidden context messages

Instructions

Use RAG to index and semantically search through text files such as documentation and code.

Examples

User	Index the current directory
Assistant	Let me index the current directory with RAG. ipython rag_index()
System	Indexed 1 paths

User

Search for documentation about functions

Assistant

I'll search for function-related documentation.ipython
rag_search("function documentation")

System

### docs/api.md
Functions are documented using docstrings...

User	Show index status
Assistant	I'll check the current status of the RAG index. ipython rag_status()
System	Index contains 42 documents

gptme.tools.rag.get_rag_context(query: str, rag_config: RagConfig, workspace: Path | None = None) → Message: Get relevant context chunks from RAG for the user query.

gptme.tools.rag.init() → ToolSpec: Initialize the RAG tool.

gptme.tools.rag.rag_index(*paths: str, glob: str | None = None) → str: Index documents in specified paths.

gptme.tools.rag.rag_search(query: str, return_full: bool = False) → str: Search indexed documents.

gptme.tools.rag.rag_status() → str: Show index status.

MCP#

The Model Context Protocol (MCP) allows you to extend gptme with custom tools through external servers. See MCP for configuration and usage details.

Tools

Contents

Tools#

Overview#

📁 File System#

💻 Code & Development#

🌐 Web & Research#

👁️ Visual & Interactive#

⚡ Advanced Workflows#

Combinations#

Shell#

Python#

Tmux#

Subagent#

Read#

Save#

Patch#

Vision#

Screenshot#

Browser#

Chats#

Computer#

RAG#

MCP#