Skip to content

feat: add fast screenshot-only path for Snapshot#98

Merged
Jeomon merged 1 commit intoCursorTouch:mainfrom
yasuhirofujii-medley:feat/fast-snapshot-no-tree
Mar 11, 2026
Merged

feat: add fast screenshot-only path for Snapshot#98
Jeomon merged 1 commit intoCursorTouch:mainfrom
yasuhirofujii-medley:feat/fast-snapshot-no-tree

Conversation

@yasuhirofujii-medley
Copy link

Summary

This PR adds a use_ui_tree parameter to the Snapshot tool so clients can request a fast screenshot-only path when they do not need interactive or scrollable element extraction.

Why this is needed

In PR #95, I exposed the existing use_annotation parameter so callers could disable bounding box overlays.
That fixed the visual-obstruction problem, but it did not fully solve the performance problem.

The remaining issue is that Snapshot still performs the full UI tree capture unconditionally inside Desktop.get_state(), even when callers use:

  • use_vision=True
  • use_annotation=False

In practice, this means callers can get a clean screenshot, but they still pay the full cost of accessibility tree traversal, window-by-window retries, and related UIA work before the image is returned.
That makes screenshot capture unnecessarily slow for vision-only agent workflows.

Root cause

Desktop.get_state() always calls:

other_windows_handles = list(controls_handles - windows_handles)
tree_state = self.tree.get_state(
    active_window_handle, other_windows_handles, use_dom=use_dom
)

That happens before screenshot generation and is independent of use_annotation.
So use_annotation=False only disables rectangle drawing, not the expensive tree capture that dominates latency.

What this PR changes

  • Adds use_ui_tree: bool | str = True to the Snapshot tool
  • Threads use_ui_tree through to Desktop.get_state()
  • Skips self.tree.get_state(...) entirely when use_ui_tree=False
  • Returns an empty TreeState shell in that mode so the response format remains stable
  • Rejects use_dom=True together with use_ui_tree=False, because DOM extraction depends on the UI tree path
  • Updates Snapshot tool description to document the new fast screenshot-only mode

Behavior

Default behavior

No breaking changes.

  • use_ui_tree=True remains the default
  • Existing clients keep the current Snapshot behavior

Fast screenshot-only behavior

Clients that only need the image can now call:

{
  "use_vision": true,
  "use_annotation": false,
  "use_ui_tree": false
}

This avoids the expensive UI tree traversal and returns a clean screenshot faster.

Testing

Added regression coverage for:

  • skipping tree capture when use_ui_tree=False
  • rejecting use_dom=True with use_ui_tree=False

Focused test run:

python -m pytest -q tests/test_snapshot_display_filter.py

Result:

  • 11 passed

Notes

PR #95 was necessary, but incomplete for the end-to-end screenshot performance issue.
It exposed overlay control, while this PR addresses the remaining root cause by making UI tree capture optional for screenshot-only workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants