feat: add fast screenshot-only path for Snapshot#98
Merged
Jeomon merged 1 commit intoCursorTouch:mainfrom Mar 11, 2026
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a
use_ui_treeparameter to theSnapshottool so clients can request a fast screenshot-only path when they do not need interactive or scrollable element extraction.Why this is needed
In PR #95, I exposed the existing
use_annotationparameter so callers could disable bounding box overlays.That fixed the visual-obstruction problem, but it did not fully solve the performance problem.
The remaining issue is that
Snapshotstill performs the full UI tree capture unconditionally insideDesktop.get_state(), even when callers use:use_vision=Trueuse_annotation=FalseIn practice, this means callers can get a clean screenshot, but they still pay the full cost of accessibility tree traversal, window-by-window retries, and related UIA work before the image is returned.
That makes screenshot capture unnecessarily slow for vision-only agent workflows.
Root cause
Desktop.get_state()always calls:That happens before screenshot generation and is independent of
use_annotation.So
use_annotation=Falseonly disables rectangle drawing, not the expensive tree capture that dominates latency.What this PR changes
use_ui_tree: bool | str = Trueto theSnapshottooluse_ui_treethrough toDesktop.get_state()self.tree.get_state(...)entirely whenuse_ui_tree=FalseTreeStateshell in that mode so the response format remains stableuse_dom=Truetogether withuse_ui_tree=False, because DOM extraction depends on the UI tree pathBehavior
Default behavior
No breaking changes.
use_ui_tree=Trueremains the defaultFast screenshot-only behavior
Clients that only need the image can now call:
{ "use_vision": true, "use_annotation": false, "use_ui_tree": false }This avoids the expensive UI tree traversal and returns a clean screenshot faster.
Testing
Added regression coverage for:
use_ui_tree=Falseuse_dom=Truewithuse_ui_tree=FalseFocused test run:
Result:
Notes
PR #95 was necessary, but incomplete for the end-to-end screenshot performance issue.
It exposed overlay control, while this PR addresses the remaining root cause by making UI tree capture optional for screenshot-only workflows.