Skip to content

feat: expose use_annotation parameter in Snapshot tool#95

Merged
Jeomon merged 1 commit intoCursorTouch:mainfrom
yasuhirofujii-medley:feat/expose-use-annotation-param
Mar 9, 2026
Merged

feat: expose use_annotation parameter in Snapshot tool#95
Jeomon merged 1 commit intoCursorTouch:mainfrom
yasuhirofujii-medley:feat/expose-use-annotation-param

Conversation

@yasuhirofujii-medley
Copy link

Summary

Expose the existing use_annotation parameter from Desktop.get_state() through the MCP Snapshot tool, allowing clients to control whether bounding box annotations are drawn on screenshots.

Problem

When use_vision=True, the Snapshot tool always returns screenshots with colored bounding boxes drawn around detected UI elements (via get_annotated_screenshot()). The Desktop.get_state() method already supports a use_annotation parameter to control this behavior, but it was not exposed through the MCP tool interface.

This is particularly problematic for AI agents using Computer Use (e.g., Claude, GPT-4V), where the colored rectangles can:

  • Obscure text that the agent needs to read
  • Cover buttons and interactive elements
  • Interfere with the agent's ability to accurately identify UI components
  • Degrade overall vision-based automation performance

Solution

Add use_annotation: bool | str = True as an optional parameter to the Snapshot tool (state_tool), and pass it through to desktop.get_state().

Changes (4 lines added, 1 modified):

  1. Tool description: Added documentation for the new parameter
  2. Function signature: Added use_annotation: bool | str = True parameter
  3. Boolean parsing: Added standard bool/str parsing (consistent with use_vision and use_dom)
  4. get_state() call: Pass use_annotation through to desktop.get_state()

Behavior:

  • use_annotation=True (default): Draws colored bounding boxes around UI elements — no change from current behavior
  • use_annotation=False: Returns a clean screenshot without overlays

Backward Compatibility

Default value is True, preserving existing behavior. No breaking changes for current users.

Testing

Tested with a real-world AI automation system running Claude Computer Use across multiple Windows machines. Setting use_annotation=False significantly improved the agent's ability to read UI elements and interact with the desktop accurately.

Add use_annotation parameter to the Snapshot (state_tool) function,
allowing MCP clients to control whether bounding box annotations
are drawn on screenshots.

When use_vision=True, the desktop.get_state() method supports a
use_annotation parameter that controls screenshot rendering:
- use_annotation=True (default): draws colored bounding boxes around
  detected UI elements via get_annotated_screenshot()
- use_annotation=False: returns a clean screenshot via get_screenshot()

Previously, this parameter was not exposed through the MCP Snapshot tool,
forcing all vision-enabled screenshots to include bounding box overlays.
This can be problematic for AI agents using Computer Use, as the colored
rectangles can obscure text and UI elements the agent needs to read.

This change adds use_annotation as an optional parameter (default: True)
to maintain backward compatibility while giving clients the ability to
request clean screenshots when needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants