improve is a post-processing step for recorded YAML tests.
ui-test improve e2e/login.yamlThis prompts you to confirm before applying improvements:
? Write improved copy to login.improved.yaml? (Y/n)
Accept (default) to write improvements to e2e/login.improved.yaml and keep e2e/login.yaml unchanged. Decline for a report-only run.
ui-test improve e2e/login.yaml --apply--apply writes both improved selectors and high-confidence assertion candidates to e2e/login.improved.yaml without prompting.
If improve needs an existing signed-in session, add --load-storage <path> to apply a Playwright storage state JSON file to improve browser contexts.
To overwrite the input file instead:
ui-test improve e2e/login.yaml --apply --in-placeTo write to a custom destination:
ui-test improve e2e/login.yaml --apply --output e2e/login.latest.yaml
ui-test improve e2e/login.yaml --apply --load-storage .auth/state.jsonui-test improve e2e/login.yaml --no-apply--no-apply writes a JSON report and does not modify YAML. Useful in CI pipelines where interactive prompts are not available.
ui-test improve e2e/login.yaml --plan--plan computes full apply-mode recommendations (selectors + assertions + runtime failure handling) but does not write YAML.
It writes:
- improve report (
*.improve-report.json) - improve plan (
*.improve-plan.json)
Apply a reviewed plan explicitly:
ui-test improve e2e/login.yaml --apply-plan e2e/login.improve-plan.jsonBy default, --apply-plan writes e2e/login.improved.yaml and preserves e2e/login.yaml.
Use --in-place to overwrite the input file or --output <path> to choose a custom destination.
In report-only runs (--no-apply), assertion candidates keep applyStatus: not_requested, including candidates that would be policy-capped or dynamic-filtered in apply mode.
Before/after example — a CSS selector upgraded to a semantic locator:
# Before
target:
value: "#submit-btn"
kind: css
source: codegen
# After
target:
value: "getByRole('button', { name: 'Submit' })"
kind: locatorExpression
source: manualui-test improve e2e/login.yaml --assertions none --apply| Source | Description |
|---|---|
snapshot-native (default) |
Uses Playwright's locator.ariaSnapshot() to capture page state changes during replay. No external tool needed. |
deterministic |
Conservative form-state-only assertions (assertValue/assertChecked). No browser needed beyond replay. |
ui-test improve e2e/login.yaml --apply --assertion-source snapshot-native
ui-test improve e2e/login.yaml --apply --assertion-source deterministic--assertion-policy controls assertion strictness in apply mode:
| Policy | Behavior |
|---|---|
reliable |
Most conservative: stable-structural snapshot assertVisible only, tighter dynamic filters, 1 applied assertion per step. |
balanced (default) |
Runtime-validated snapshot assertVisible allowed, moderate dynamic filters, up to 2 applied assertions per step. |
aggressive |
Highest coverage: runtime-validated snapshot assertVisible, light dynamic filtering, up to 3 applied assertions per step. |
Exact policy matrix:
| Policy | Applied per-step cap | Snapshot volume cap (navigate/other) |
Snapshot assertVisible |
Snapshot assertText min score |
Volatility hard-filter flags |
|---|---|---|---|---|---|
reliable |
1 |
1/2 |
stable_structural_only |
0.82 |
contains_numeric_fragment, contains_date_or_time_fragment, contains_weather_or_news_fragment, long_text, contains_headline_like_text, contains_pipe_separator |
balanced |
2 |
2/3 |
runtime_validated |
0.78 |
contains_headline_like_text, contains_pipe_separator |
aggressive |
3 |
3/4 |
runtime_validated |
0.72 |
contains_headline_like_text |
ui-test improve e2e/login.yaml --apply --assertion-policy balanced
ui-test improve e2e/login.yaml --apply --assertion-policy reliable
ui-test improve e2e/login.yaml --apply --assertion-policy aggressivecandidates(default): generate and optionally apply assertion candidates.none: skip assertion generation entirely.
ui-test improve e2e/login.yaml --assertions candidates
ui-test improve e2e/login.yaml --assertions noneThese rules govern how assertions are inserted:
- Per-step apply cap is policy-driven:
reliable=1,balanced=2,aggressive=3; extra candidates are markedskipped_policy. - Snapshot
assertVisiblehandling is policy-driven:reliableonly allows stable-structural candidates, whilebalanced/aggressiveallow runtime-validated visibility candidates. - Snapshot
assertTextmin apply score is policy-driven (reliable=0.82,balanced=0.78,aggressive=0.72). - Volatility hard-filtering is policy-driven and only applied in apply mode.
- Runtime-failing candidates are never force-applied (
skipped_runtime_failure). - Deterministic coverage fallbacks (
click/press/hover->assertVisible) remain eligible in apply mode as backup candidates when stronger assertions fail runtime validation. - In
snapshot-nativemode, improve performs gap-only runtime locator inventory harvesting from post-step aria snapshots and adds inventory fallback candidates only for uncovered interaction steps. - Existing adjacent assertions are preserved (no automatic cleanup).
- Applied assertions are inserted as required steps (no
optionalfield). - In apply mode, runtime-failing interaction steps are classified with confidence/safety metadata: only high-confidence safe transient dismissal/control interactions are auto-removed; low-confidence or unsafe removals are retained as required steps.
- Runtime-derived auto-apply is guarded by determinism checks. If the test has no
baseUrl, replay targets a different host, or replay drifts cross-origin, runtime-derived selector repairs, snapshot-native assertion insertions, and runtime-failure removals stay report-only and are called out in report diagnostics/determinism.
After recording, ui-test record automatically runs improve to upgrade selectors, add assertion candidates, and classify runtime-failing interactions (aggressively remove transient dismissal/control click/press failures, retain non-transient and safeguarded content/business interactions as required steps). Use --no-improve to skip this:
ui-test record --name login --url https://example.com --no-improveWith --assertion-source deterministic, auto-apply uses a conservative mapping:
fill/select→assertValuecheck/uncheck→assertCheckedclick/press/hover→ low-priority coverage fallbackassertVisiblecandidates (coverageFallback: true, confidence0.76)
Coverage fallback candidates are always generated, but they remain low priority:
- Non-fallback candidates are prioritized first by scoring/action policy.
- Fallbacks still run through normal policy/runtime validation and can apply when higher-priority assertions fail at runtime.
- Once a non-fallback assertion is applied for a step, remaining fallback candidates for that step are skipped as backup-only.
- When both are fallback candidates, deterministic fallback is preferred over inventory fallback.
When --assertion-source snapshot-native is active, improve reuses per-step post-action ariaSnapshot() state to harvest additional locator/assertion evidence for under-covered interaction steps.
- Runs only for assertion generation mode (
--assertions candidates). - Gap-only by default: steps already covered by non-fallback assertions do not get extra inventory candidates.
- Uses no extra replay pass; it reuses snapshots already captured during runtime analysis.
- Inventory candidates are marked as fallback (
coverageFallback: true) and still pass through normal policy filtering and runtime validation.
If the snapshot source is unavailable or fails, improve falls back to deterministic candidates. Diagnostics include fallback reason codes in the JSON report.
When a browser is available, improve uses Playwright's ariaSnapshot() API to inspect each element's accessibility role and name. This generates semantic locator candidates:
getByRole(role, { name })— for any element with an accessible role and namegetByLabel(name)— for form controls (textbox, combobox, listbox, searchbox, spinbutton)getByPlaceholder(text)— for form controls with a placeholder attributegetByText(text)— for text-bearing roles (headings, links, alerts, status elements)getByTestId(id)— when runtime attributes exposedata-testid/data-test-idlocator('tag[name="..."]')/locator('[id="..."]')— for unlabeled controls with stable runtime attributesgetByRole('row', { name: ... }).getByRole('textbox')— as a lower-confidence fallback for repeated unlabeled controls that only become distinguishable through row context
These candidates are scored alongside syntactic candidates. Auto-apply is intentionally bounded: candidates must score significantly higher than the current selector (delta >= 0.15), meet the high-confidence threshold, and resolve uniquely at runtime before YAML is mutated. Lower-confidence recommendations remain in the report for review instead of being applied silently.
For dynamic-flagged/brittle targets (for example long exact headline link names), improve also attempts a runtime selector regeneration pass:
- Requires a unique runtime match (
matchCount === 1) before generating a repair candidate. - Runs as a dedicated runtime-repair stage after baseline and heuristic locator-repair candidate generation.
- Converts supported
internal/ selector-engine targets into locator expressions using deterministic public selector parsing. - Falls back safely to existing repair heuristics when conversion is unavailable or the selector shape is unsupported.
- Set
UI_TEST_DISABLE_PLAYWRIGHT_RUNTIME_REGEN=1to disable runtime regeneration/conversion and use heuristic repairs only.
If replay later encounters a broken locator and no high-confidence unique repair is available, play stops and points you back to ui-test improve <file> --apply rather than continuing with ambiguous selector guesses.
Runtime regeneration diagnostics:
selector_repair_generated_via_playwright_runtimeselector_repair_playwright_runtime_unavailableselector_repair_playwright_runtime_non_uniqueselector_repair_playwright_runtime_conversion_failedselector_repair_playwright_runtime_disabled
The report includes step-level old/recommended targets, confidence scores, assertion candidates, and diagnostics.
Diagnostics may include decision metadata:
decisionConfidencemutationTypemutationSafetyevidenceRefsappliedBy
The summary includes:
selectorRepairCandidatesselectorRepairsAppliedselectorRepairsGeneratedByPlaywrightRuntimeselectorRepairsAppliedFromPlaywrightRuntimeruntimeFailingStepsRetainedruntimeFailingStepsRemovedassertionCandidatesFilteredDynamicassertionCoverageStepsTotalassertionCoverageStepsWithCandidatesassertionCoverageStepsWithAppliedassertionCoverageCandidateRateassertionCoverageAppliedRateassertionFallbackAppliedassertionFallbackAppliedOnlyStepsassertionFallbackAppliedWithNonFallbackStepsassertionInventoryStepsEvaluatedassertionInventoryCandidatesAddedassertionInventoryGapStepsFilled
Each assertion candidate has an applyStatus:
| Status | Meaning |
|---|---|
applied |
Written to YAML |
skipped_low_confidence |
Below confidence threshold |
skipped_runtime_failure |
Failed runtime validation |
skipped_policy |
Apply-mode policy skip (for example visibility rules, dynamic hard-filter, or profile cap reached) |
skipped_existing |
Step already has an assertion |
not_requested |
Report-only run (--no-apply): candidate was generated but not considered for apply/validation |
Runtime-failing step diagnostics use runtime_failing_step_retained.
Default report path: <test-file>.improve-report.json
Custom path:
ui-test improve e2e/login.yaml --report ./reports/login.improve.json- Runtime analysis may replay actions; use a safe test environment.
improverequires Chromium availability in CLI runs.- If Chromium is missing, provision it with
ui-test setupornpx playwright install chromium. - Validation timing mirrors
playpost-step readiness checks: navigation waits happen automatically, andnetworkidleis opt-in.