improve.md

Improve Workflow

improve is a post-processing step for recorded YAML tests.

Usage

Default (Interactive)

ui-test improve e2e/login.yaml

This prompts you to confirm before applying improvements:

? Write improved copy to login.improved.yaml? (Y/n)

Accept (default) to write improvements to e2e/login.improved.yaml and keep e2e/login.yaml unchanged. Decline for a report-only run.

Apply Without Prompting (CI)

ui-test improve e2e/login.yaml --apply

--apply writes both improved selectors and high-confidence assertion candidates to e2e/login.improved.yaml without prompting.

If improve needs an existing signed-in session, add --load-storage <path> to apply a Playwright storage state JSON file to improve browser contexts.

To overwrite the input file instead:

ui-test improve e2e/login.yaml --apply --in-place

To write to a custom destination:

ui-test improve e2e/login.yaml --apply --output e2e/login.latest.yaml
ui-test improve e2e/login.yaml --apply --load-storage .auth/state.json

Report Only (CI)

ui-test improve e2e/login.yaml --no-apply

--no-apply writes a JSON report and does not modify YAML. Useful in CI pipelines where interactive prompts are not available.

Generate A Review Plan

ui-test improve e2e/login.yaml --plan

--plan computes full apply-mode recommendations (selectors + assertions + runtime failure handling) but does not write YAML. It writes:

improve report (*.improve-report.json)
improve plan (*.improve-plan.json)

Apply a reviewed plan explicitly:

ui-test improve e2e/login.yaml --apply-plan e2e/login.improve-plan.json

By default, --apply-plan writes e2e/login.improved.yaml and preserves e2e/login.yaml. Use --in-place to overwrite the input file or --output <path> to choose a custom destination.

In report-only runs (--no-apply), assertion candidates keep applyStatus: not_requested, including candidates that would be policy-capped or dynamic-filtered in apply mode.

Before/after example — a CSS selector upgraded to a semantic locator:

# Before
target:
  value: "#submit-btn"
  kind: css
  source: codegen
# After
target:
  value: "getByRole('button', { name: 'Submit' })"
  kind: locatorExpression
  source: manual

Selectors Only (No Assertions)

ui-test improve e2e/login.yaml --assertions none --apply

Assertion Sources

Source	Description
`snapshot-native` (default)	Uses Playwright's `locator.ariaSnapshot()` to capture page state changes during replay. No external tool needed.
`deterministic`	Conservative form-state-only assertions (`assertValue`/`assertChecked`). No browser needed beyond replay.

ui-test improve e2e/login.yaml --apply --assertion-source snapshot-native
ui-test improve e2e/login.yaml --apply --assertion-source deterministic

Assertion Policy

--assertion-policy controls assertion strictness in apply mode:

Policy	Behavior
`reliable`	Most conservative: stable-structural snapshot `assertVisible` only, tighter dynamic filters, 1 applied assertion per step.
`balanced` (default)	Runtime-validated snapshot `assertVisible` allowed, moderate dynamic filters, up to 2 applied assertions per step.
`aggressive`	Highest coverage: runtime-validated snapshot `assertVisible`, light dynamic filtering, up to 3 applied assertions per step.

Exact policy matrix:

Policy	Applied per-step cap	Snapshot volume cap (`navigate`/other)	Snapshot `assertVisible`	Snapshot `assertText` min score	Volatility hard-filter flags
`reliable`	`1`	`1/2`	`stable_structural_only`	`0.82`	`contains_numeric_fragment`, `contains_date_or_time_fragment`, `contains_weather_or_news_fragment`, `long_text`, `contains_headline_like_text`, `contains_pipe_separator`
`balanced`	`2`	`2/3`	`runtime_validated`	`0.78`	`contains_headline_like_text`, `contains_pipe_separator`
`aggressive`	`3`	`3/4`	`runtime_validated`	`0.72`	`contains_headline_like_text`

ui-test improve e2e/login.yaml --apply --assertion-policy balanced
ui-test improve e2e/login.yaml --apply --assertion-policy reliable
ui-test improve e2e/login.yaml --apply --assertion-policy aggressive

Assertions Mode

candidates (default): generate and optionally apply assertion candidates.
none: skip assertion generation entirely.

ui-test improve e2e/login.yaml --assertions candidates
ui-test improve e2e/login.yaml --assertions none

Assertion Rules

These rules govern how assertions are inserted:

Per-step apply cap is policy-driven: reliable=1, balanced=2, aggressive=3; extra candidates are marked skipped_policy.
Snapshot assertVisible handling is policy-driven: reliable only allows stable-structural candidates, while balanced/aggressive allow runtime-validated visibility candidates.
Snapshot assertText min apply score is policy-driven (reliable=0.82, balanced=0.78, aggressive=0.72).
Volatility hard-filtering is policy-driven and only applied in apply mode.
Runtime-failing candidates are never force-applied (skipped_runtime_failure).
Deterministic coverage fallbacks (click/press/hover -> assertVisible) remain eligible in apply mode as backup candidates when stronger assertions fail runtime validation.
In snapshot-native mode, improve performs gap-only runtime locator inventory harvesting from post-step aria snapshots and adds inventory fallback candidates only for uncovered interaction steps.
Existing adjacent assertions are preserved (no automatic cleanup).
Applied assertions are inserted as required steps (no optional field).
In apply mode, runtime-failing interaction steps are classified with confidence/safety metadata: only high-confidence safe transient dismissal/control interactions are auto-removed; low-confidence or unsafe removals are retained as required steps.
Runtime-derived auto-apply is guarded by determinism checks. If the test has no baseUrl, replay targets a different host, or replay drifts cross-origin, runtime-derived selector repairs, snapshot-native assertion insertions, and runtime-failure removals stay report-only and are called out in report diagnostics/determinism.

Auto-Improve After Recording

After recording, ui-test record automatically runs improve to upgrade selectors, add assertion candidates, and classify runtime-failing interactions (aggressively remove transient dismissal/control click/press failures, retain non-transient and safeguarded content/business interactions as required steps). Use --no-improve to skip this:

ui-test record --name login --url https://example.com --no-improve

Reference

Deterministic Source Mapping

With --assertion-source deterministic, auto-apply uses a conservative mapping:

fill / select → assertValue
check / uncheck → assertChecked
click / press / hover → low-priority coverage fallback assertVisible candidates (coverageFallback: true, confidence 0.76)

Coverage fallback candidates are always generated, but they remain low priority:

Non-fallback candidates are prioritized first by scoring/action policy.
Fallbacks still run through normal policy/runtime validation and can apply when higher-priority assertions fail at runtime.
Once a non-fallback assertion is applied for a step, remaining fallback candidates for that step are skipped as backup-only.
When both are fallback candidates, deterministic fallback is preferred over inventory fallback.

Snapshot-Native Inventory Harvesting

When --assertion-source snapshot-native is active, improve reuses per-step post-action ariaSnapshot() state to harvest additional locator/assertion evidence for under-covered interaction steps.

Runs only for assertion generation mode (--assertions candidates).
Gap-only by default: steps already covered by non-fallback assertions do not get extra inventory candidates.
Uses no extra replay pass; it reuses snapshots already captured during runtime analysis.
Inventory candidates are marked as fallback (coverageFallback: true) and still pass through normal policy filtering and runtime validation.

Assertion Source Fallback

If the snapshot source is unavailable or fails, improve falls back to deterministic candidates. Diagnostics include fallback reason codes in the JSON report.

Aria-Based Selector Improvement

When a browser is available, improve uses Playwright's ariaSnapshot() API to inspect each element's accessibility role and name. This generates semantic locator candidates:

getByRole(role, { name }) — for any element with an accessible role and name
getByLabel(name) — for form controls (textbox, combobox, listbox, searchbox, spinbutton)
getByPlaceholder(text) — for form controls with a placeholder attribute
getByText(text) — for text-bearing roles (headings, links, alerts, status elements)
getByTestId(id) — when runtime attributes expose data-testid / data-test-id
locator('tag[name="..."]') / locator('[id="..."]') — for unlabeled controls with stable runtime attributes
getByRole('row', { name: ... }).getByRole('textbox') — as a lower-confidence fallback for repeated unlabeled controls that only become distinguishable through row context

These candidates are scored alongside syntactic candidates. Auto-apply is intentionally bounded: candidates must score significantly higher than the current selector (delta >= 0.15), meet the high-confidence threshold, and resolve uniquely at runtime before YAML is mutated. Lower-confidence recommendations remain in the report for review instead of being applied silently.

Playwright Runtime Selector Regeneration

For dynamic-flagged/brittle targets (for example long exact headline link names), improve also attempts a runtime selector regeneration pass:

Requires a unique runtime match (matchCount === 1) before generating a repair candidate.
Runs as a dedicated runtime-repair stage after baseline and heuristic locator-repair candidate generation.
Converts supported internal / selector-engine targets into locator expressions using deterministic public selector parsing.
Falls back safely to existing repair heuristics when conversion is unavailable or the selector shape is unsupported.
Set UI_TEST_DISABLE_PLAYWRIGHT_RUNTIME_REGEN=1 to disable runtime regeneration/conversion and use heuristic repairs only.

If replay later encounters a broken locator and no high-confidence unique repair is available, play stops and points you back to ui-test improve <file> --apply rather than continuing with ambiguous selector guesses.

Runtime regeneration diagnostics:

selector_repair_generated_via_playwright_runtime
selector_repair_playwright_runtime_unavailable
selector_repair_playwright_runtime_non_unique
selector_repair_playwright_runtime_conversion_failed
selector_repair_playwright_runtime_disabled

Report Contents

The report includes step-level old/recommended targets, confidence scores, assertion candidates, and diagnostics.

Diagnostics may include decision metadata:

decisionConfidence
mutationType
mutationSafety
evidenceRefs
appliedBy

The summary includes:

selectorRepairCandidates
selectorRepairsApplied
selectorRepairsGeneratedByPlaywrightRuntime
selectorRepairsAppliedFromPlaywrightRuntime
runtimeFailingStepsRetained
runtimeFailingStepsRemoved
assertionCandidatesFilteredDynamic
assertionCoverageStepsTotal
assertionCoverageStepsWithCandidates
assertionCoverageStepsWithApplied
assertionCoverageCandidateRate
assertionCoverageAppliedRate
assertionFallbackApplied
assertionFallbackAppliedOnlySteps
assertionFallbackAppliedWithNonFallbackSteps
assertionInventoryStepsEvaluated
assertionInventoryCandidatesAdded
assertionInventoryGapStepsFilled

Each assertion candidate has an applyStatus:

Status	Meaning
`applied`	Written to YAML
`skipped_low_confidence`	Below confidence threshold
`skipped_runtime_failure`	Failed runtime validation
`skipped_policy`	Apply-mode policy skip (for example visibility rules, dynamic hard-filter, or profile cap reached)
`skipped_existing`	Step already has an assertion
`not_requested`	Report-only run (`--no-apply`): candidate was generated but not considered for apply/validation

Runtime-failing step diagnostics use runtime_failing_step_retained.

Default report path: <test-file>.improve-report.json

Custom path:

ui-test improve e2e/login.yaml --report ./reports/login.improve.json

Runtime Safety Notes

Runtime analysis may replay actions; use a safe test environment.
improve requires Chromium availability in CLI runs.
If Chromium is missing, provision it with ui-test setup or npx playwright install chromium.
Validation timing mirrors play post-step readiness checks: navigation waits happen automatically, and networkidle is opt-in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Workflow

Usage

Default (Interactive)

Apply Without Prompting (CI)

Report Only (CI)

Generate A Review Plan

Selectors Only (No Assertions)

Assertion Sources

Assertion Policy

Assertions Mode

Assertion Rules

Auto-Improve After Recording

Reference

Deterministic Source Mapping

Snapshot-Native Inventory Harvesting

Assertion Source Fallback

Aria-Based Selector Improvement

Playwright Runtime Selector Regeneration

Report Contents

Runtime Safety Notes

FilesExpand file tree

improve.md

Latest commit

History

improve.md

File metadata and controls

Improve Workflow

Usage

Default (Interactive)

Apply Without Prompting (CI)

Report Only (CI)

Generate A Review Plan

Selectors Only (No Assertions)

Assertion Sources

Assertion Policy

Assertions Mode

Assertion Rules

Auto-Improve After Recording

Reference

Deterministic Source Mapping

Snapshot-Native Inventory Harvesting

Assertion Source Fallback

Aria-Based Selector Improvement

Playwright Runtime Selector Regeneration

Report Contents

Runtime Safety Notes