Replace the current mix of:
- fg/bg ordering barriers
- numeric filename ordering used as dependency encoding
- marker files like
cdp_url.txt,target_id.txt,extensions.json,navigation.json,prenav.json - sibling file/log polling
with a smaller model:
- hooks subscribe only via
on_<EventName>__... - hooks emit real facts on stdout as JSONL
abx-dlroutes those facts dynamically throughabxbusabx-dlauto-emitsAfter<EventName>as a synthetic settle barrierabx-dlreduces all prior scoped events into a derived key/value contextabx-dlmirrors crawl-scoped context toCRAWL_DIR/derived.envabx-dlmirrors snapshot-scoped context toSNAP_DIR/derived.envby overlaying snapshot state on top of the current crawl context- hooks receive the full reduced context as env vars
- hooks receive only the triggering event payload as CLI args
Chrome remains the stable namespace for all Chrome-like providers.
Use only:
on_<EventName>__...
Examples:
on_SnapshotChromeTabReady__21_consolelog.daemon.bg.json_AfterSnapshotChromeTabReady__30_chrome_navigate.json_SnapshotChromeTabNavigated__51_screenshot.json_AfterSnapshot__70_parse_html_urls.py
There is no separate after_* hook syntax.
Emitted by hooks when they create new meaning or new payload.
Examples:
CrawlChromeBrowserReadyCrawlChromeExtensionsReadySnapshotChromeBrowserReadySnapshotChromeExtensionsReadySnapshotChromeTabReadySnapshotChromeTabNavigatedSnapshotChromeTabNavigationFailedSnapshotChromeMainResponseSavedSnapshotChromeStaticFileHandledSnapshotPdfOcrCompleteUrlDiscovered
Emitted only by abx-dl.
Examples:
AfterSnapshotAfterSnapshotChromeTabReadyAfterSnapshotChromeTabNavigatedAfterSnapshotPdfOcrComplete
Meaning:
After<Event>= the full subtree rooted at<Event>has settled
After<Event> is for ordering only. It does not introduce new semantics.
Use a real event when:
- a hook creates a meaningful state transition
- a hook creates new payload another hook may consume
- a hook wants to update the shared derived context
Use After<Event> when:
- a later hook only needs barrier/ordering semantics
- no new domain payload is needed
Examples:
chrome_navigateshould run onAfterSnapshotChromeTabReadyonce pre-navigation bg settle semantics are defined preciselyscreenshotshould run onSnapshotChromeTabNavigated, notAfterSnapshotChromeTabReady- late parsers/indexers should run on
AfterSnapshot - OCR should emit a real event like
SnapshotPdfOcrComplete - late OCR consumers can run on
AfterSnapshotPdfOcrComplete
abx-dl must not know Chrome-specific meanings.
It only needs to know:
- how to route events by string name
- how to launch hooks for
on_<Event> - how to synthesize
After<Event> - how to route both real events and synthetic
After<Event>barriers through the same generic dispatch path - how to reduce scoped prior events into current key/value context
- how to mirror reduced context into
derived.env - how to pass env vars and event payload args into hooks
- how to wait for the full root
Snapshottree before cleanup
No plugin-specific event classes or Chrome-specific scheduling logic should be added to abx-dl.
Hooks emit JSONL records with:
- required top-level
type - optional normal event payload fields
- optional reserved
envpatch for durable aggregate state
Example:
{
"type": "SnapshotChromeTabReady",
"url": "https://example.com",
"env": {
"CDP_URL": "ws://127.0.0.1:9222/devtools/browser/...",
"TARGET_ID": "ABC123",
"EXTENSIONS": [
{"id": "ublock", "name": "uBlock Origin"}
]
}
}env is the only generic state-update lane.
Rules:
- scalar values: last write wins
null: unset/remove key- arrays: append
- objects: deep-merge
- type changes: latest value replaces incompatible earlier type
Implications:
- multiple hooks can contribute to
EXTENSIONS - later hooks can correct stale
TARGET_ID/CDP_URL - later hooks launched by
abx-dlwill automatically see corrected values
For derived.env mirroring:
- scalars are written as plain env values
- arrays/objects are serialized as JSON strings
Examples:
CDP_URL=ws://...TARGET_ID=ABC123EXTENSIONS=[{"id":"ublock","name":"uBlock Origin"}]
Only abx-dl writes derived.env.
The event log is canonical.
abx-dl should maintain scope directionality explicitly:
- crawl events reduce into one crawl context mirrored to
CRAWL_DIR/derived.env - each snapshot starts from the current crawl context
- snapshot events then overlay snapshot-local state on top of that base
- the resulting per-snapshot view is mirrored to
SNAP_DIR/derived.env - snapshot-local reduction must never write back into
CRAWL_DIR/derived.env
derived.env is:
- generated
- for observability and crash inspection
- useful for manual standalone hook invocation
- not the canonical source of truth
This one-way flow is required because multiple snapshots may run in parallel within a single crawl.
Only durable/current-state values that later hooks may need fresh:
URLCRAWL_IDSNAPSHOT_IDCRAWL_DIRSNAP_DIRCDP_URLTARGET_IDFINAL_URLHTTP_STATUSEXTENSIONSMAIN_RESPONSE_PATHMAIN_RESPONSE_MIMESTATICFILE_HANDLEDSTATICFILE_PATH
Avoid stuffing transient/debug-only values into reduced context.
Every hook gets the full reduced context as env vars.
Examples:
URLCRAWL_IDSNAPSHOT_IDSNAP_DIRCDP_URLTARGET_IDFINAL_URLHTTP_STATUSEXTENSIONS
Every hook also gets the triggering event payload as --kebab-case=value CLI args.
Important split:
- reduced context is passed via env vars
- current event payload is passed via CLI args
- only fields on the triggering event become CLI args
- reduced context values should not also be sprayed onto the CLI unless they are part of the triggering event payload
Example:
- env:
URL=...CDP_URL=...TARGET_ID=...
- CLI:
--url=...--final-url=...--http-status=200
EXTRA_CONTEXT stays opaque pass-through metadata only.
- hooks should not depend on it for runtime behavior
- hooks should consume explicit env vars / CLI args instead
- emitted
ArchiveResult,Snapshot, andUrlDiscoveredrecords may still merge it forward for lineage
- make root lifecycle event types line up with hook filenames:
CrawlSetupSnapshot
- keep
parse_hook_filename()as the source of truth foron_<Event>__... - no whitelist of supported event names
- store the raw trigger event string exactly as parsed from the filename
- keep the existing service
- dispatch all crawl-tree hooks dynamically by event string, including synthetic
AfterCrawl*events if/when used - convert hook stdout records with
typebeginning withCrawlintoBaseEvent(event_type=...) - reduce prior crawl-scoped events into current crawl context
- mirror reduced crawl context to
CRAWL_DIR/derived.env - pass reduced crawl context as env vars plus current event payload as CLI args
- synthesize
After<Event>for crawl events
- keep the existing service
- dispatch all snapshot-tree hooks dynamically by event string, including synthetic
AfterSnapshot*events - convert hook stdout records with
typebeginning withSnapshotintoBaseEvent(event_type=...) - route non-reserved emitted events like
UrlDiscoveredas normal bus events under the current snapshot tree instead of hardcoding exacttype == "Snapshot" - start from the current crawl context as a base
- reduce prior snapshot-scoped events into a snapshot-local overlay on top of that base
- mirror reduced snapshot context to
SNAP_DIR/derived.env - pass reduced snapshot context as env vars plus current event payload as CLI args
- auto-emit
After<Event>after the routed event subtree settles - emit
SnapshotCleanupEventonly after the full rootSnapshottree settles
- expand fallback
ArchiveResultsynthesis beyond hooks whose names start withon_Snapshot on_AfterSnapshot__...hooks must also get the same fallback result behavior- more generally, fallback result synthesis should apply to hooks whose process is a descendant of a root snapshot lifecycle event, not only to one filename prefix
Rules:
After<Event>is emitted only byabx-dl- hooks must never emit
After*themselves After<Event>is emitted as a child of<Event>, before<Event>is considered complete- do not auto-generate
AfterAfter<Event> SnapshotCleanupwaits for the full rootSnapshottree, including all nestedAfter<Event>branchesAfterSnapshotis solid because it is rooted in full snapshot-tree settlementAfterSnapshotChromeTabReadymust not be treated as implemented until bg pre-navigation hook settle semantics are defined precisely; otherwise it can fire either too early or too late under the current bg process model
That is what makes multi-stage settled pipelines work.
CrawlSetupSnapshot
UrlDiscovered
UrlDiscovered should replace hooks emitting new Snapshot records for crawl discovery.
Payload should carry all useful metadata in one record, e.g.:
urltitletagsbookmarked_at- parser/feed/bookmark metadata
CrawlChromeBrowserReadyCrawlChromeExtensionsReady
SnapshotChromeBrowserReadySnapshotChromeExtensionsReadySnapshotChromeTabReadySnapshotChromeTabNavigatedSnapshotChromeTabNavigationFailedSnapshotChromeTabClosed
Only add these if they materially simplify orchestration:
SnapshotChromeMainResponseSavedSnapshotChromeStaticFileHandledSnapshotChromeHeadersCapturedSnapshotChromeRedirectsCaptured
Do not add a trivial event for every implementation detail.
- only emit
CrawlChrome*facts during crawl setup - only emit
SnapshotChrome*facts during snapshot execution - never synthesize snapshot aliases from crawl browser facts
If CHROME_ISOLATION=crawl:
- crawl launch emits
CrawlChromeBrowserReadyandCrawlChromeExtensionsReady - snapshot phase starts at
SnapshotChromeTabReady - there is no
SnapshotChromeBrowserReadyalias
If CHROME_ISOLATION=snapshot:
- snapshot launch emits
SnapshotChromeBrowserReadyandSnapshotChromeExtensionsReady - snapshot phase then proceeds to
SnapshotChromeTabReady
The first snapshot-scoped Chrome event guaranteed in both modes is:
SnapshotChromeTabReady
CrawlSetupon_CrawlSetup__89_chrome_kill_zombies.json_CrawlSetup__90_chrome_launch.daemon.bg.jsin crawl isolation- emits
CrawlChromeBrowserReady - emits
CrawlChromeExtensionsReady - patches
envwith browser-level values likeCDP_URL,EXTENSIONS
- emits
- crawl-scoped extension config hooks subscribe to
CrawlChromeExtensionsReadyabx_plugins/plugins/twocaptcha/on_CrawlSetup__95_twocaptcha_config.jsabx_plugins/plugins/claudechrome/on_CrawlSetup__96_claudechrome_config.js
Snapshot- root non-Chrome downloaders stay on
Snapshot, e.g.:abx_plugins/plugins/ytdlp/on_Snapshot__02_ytdlp.finite.bg.pyabx_plugins/plugins/gallerydl/on_Snapshot__03_gallerydl.finite.bg.pyabx_plugins/plugins/forumdl/on_Snapshot__04_forumdl.finite.bg.pyabx_plugins/plugins/git/on_Snapshot__05_git.finite.bg.pyabx_plugins/plugins/wget/on_Snapshot__06_wget.finite.bg.pyabx_plugins/plugins/archivedotorg/on_Snapshot__08_archivedotorg.finite.bg.pyabx_plugins/plugins/favicon/on_Snapshot__11_favicon.finite.bg.pyabx_plugins/plugins/papersdl/on_Snapshot__66_papersdl.finite.bg.py
- Chrome branch:
- crawl isolation:
chrome_tabsubscribes directly toSnapshot - snapshot isolation: launch emits
SnapshotChromeBrowserReadyandSnapshotChromeExtensionsReady, thenchrome_tabruns
- crawl isolation:
chrome_tabemitsSnapshotChromeTabReady- patches
envwithTARGET_IDand any corrected tab-level state
- patches
- pre-navigation listeners run on
SnapshotChromeTabReady - once the pre-navigation bg settle semantics are defined correctly,
chrome_navigatecan run onAfterSnapshotChromeTabReady chrome_navigateemitsSnapshotChromeTabNavigatedorSnapshotChromeTabNavigationFailed- patches
envwith correctedFINAL_URL,HTTP_STATUS,TARGET_IDif needed
- patches
- post-navigation extractors run on
SnapshotChromeTabNavigated - once the entire root
Snapshottree settles,abx-dlemitsAfterSnapshot - late parsers / OCR / indexers / cleanup / hashes run on
AfterSnapshot - if any of those late hooks emit new real events,
abx-dlmay synthesizeAfter<ThoseEvents>and the tree continues SnapshotCleanupruns only after the full rootSnapshottree finishes
AfterSnapshot solves these generically:
- Chrome-disabled runs still work
- URL parsers can run after all snapshot outputs exist
- later stages like OCR can emit real follow-on events without special scheduler code
Delete entirely:
abx_plugins/plugins/chrome/on_CrawlSetup__91_chrome_wait.jsabx_plugins/plugins/chrome/on_Snapshot__11_chrome_wait.js
- emit
CrawlChromeBrowserReady - emit
CrawlChromeExtensionsReady - patch
envwith browser-level state
- only run in snapshot isolation
- emit
SnapshotChromeBrowserReady - emit
SnapshotChromeExtensionsReady - patch
envwith browser-level state - remove current crawl-isolation no-op behavior
- in crawl isolation, subscribe to
Snapshot - in snapshot isolation, subscribe to the relevant snapshot browser-ready event
- emit
SnapshotChromeTabReady - patch
envwith fresh tab state - stop being the publisher of runtime marker files used for orchestration
- split or rewrite helper usage so fresh reduced env state can replace the current session-dir marker contract over time
- move to
on_AfterSnapshotChromeTabReady__30_chrome_navigate.jsonce that settle barrier is well-defined for bg pre-navigation hooks - remove
prenav.jsonpolling and marker-based gating - emit
SnapshotChromeTabNavigatedorSnapshotChromeTabNavigationFailed - patch
envwith latest navigation state
Move to on_SnapshotChromeTabReady__...:
abx_plugins/plugins/consolelog/on_Snapshot__21_consolelog.daemon.bg.jsabx_plugins/plugins/dns/on_Snapshot__22_dns.daemon.bg.jsabx_plugins/plugins/sslcerts/on_Snapshot__23_sslcerts.daemon.bg.jsabx_plugins/plugins/responses/on_Snapshot__24_responses.daemon.bg.jsabx_plugins/plugins/redirects/on_Snapshot__25_redirects.daemon.bg.jsabx_plugins/plugins/staticfile/on_Snapshot__26_staticfile.daemon.bg.jsabx_plugins/plugins/headers/on_Snapshot__27_headers.daemon.bg.jsabx_plugins/plugins/ublock/on_Snapshot__12_ublock.daemon.bg.jsabx_plugins/plugins/istilldontcareaboutcookies/on_Snapshot__13_istilldontcareaboutcookies.daemon.bg.jsabx_plugins/plugins/twocaptcha/on_Snapshot__14_twocaptcha.daemon.bg.jsabx_plugins/plugins/modalcloser/on_Snapshot__15_modalcloser.daemon.bg.js
Remove:
- readiness file creation
- numeric “must run before navigate” comments
waitForNavigationComplete(...)used as control-plane gating
Important note:
- this stage is still the main unresolved runtime detail in the plan
- background pre-navigation hooks currently launch without a precise runtime notion of “settled enough for
AfterSnapshotChromeTabReady” - do not move
chrome_navigateuntil that settle barrier is defined correctly
Move to on_SnapshotChromeTabNavigated__...:
abx_plugins/plugins/seo/on_Snapshot__38_seo.jsabx_plugins/plugins/accessibility/on_Snapshot__39_accessibility.jsabx_plugins/plugins/infiniscroll/on_Snapshot__45_infiniscroll.jsabx_plugins/plugins/claudechrome/on_Snapshot__47_claudechrome.jsabx_plugins/plugins/singlefile/on_Snapshot__50_singlefile.pyabx_plugins/plugins/singlefile/singlefile_extension_save.jsabx_plugins/plugins/screenshot/on_Snapshot__51_screenshot.jsabx_plugins/plugins/pdf/on_Snapshot__52_pdf.jsabx_plugins/plugins/dom/on_Snapshot__53_dom.jsabx_plugins/plugins/title/on_Snapshot__54_title.jsabx_plugins/plugins/parse_dom_outlinks/on_Snapshot__75_parse_dom_outlinks.js
Move to on_AfterSnapshot__... if they should run after all snapshot outputs exist and are actually intended to consume generated snapshot artifacts rather than the original input URL directly:
abx_plugins/plugins/readability/on_Snapshot__56_readability.pyabx_plugins/plugins/mercury/on_Snapshot__57_mercury.pyabx_plugins/plugins/defuddle/on_Snapshot__57_defuddle.pyabx_plugins/plugins/htmltotext/on_Snapshot__58_htmltotext.pyabx_plugins/plugins/claudecodeextract/on_Snapshot__58_claudecodeextract.pyabx_plugins/plugins/trafilatura/on_Snapshot__59_trafilatura.pyabx_plugins/plugins/opendataloader/on_Snapshot__60_opendataloader.pyabx_plugins/plugins/liteparse/on_Snapshot__61_liteparse.pyabx_plugins/plugins/parse_html_urls/on_Snapshot__70_parse_html_urls.pyabx_plugins/plugins/search_backend_sqlite/on_Snapshot__90_index_sqlite.pyabx_plugins/plugins/search_backend_sonic/on_Snapshot__91_index_sonic.pyabx_plugins/plugins/claudecodecleanup/on_Snapshot__92_claudecodecleanup.pyabx_plugins/plugins/hashes/on_Snapshot__93_hashes.py
Review needed for:
abx_plugins/plugins/parse_txt_urls/on_Snapshot__71_parse_txt_urls.pyabx_plugins/plugins/parse_rss_urls/on_Snapshot__72_parse_rss_urls.pyabx_plugins/plugins/parse_netscape_urls/on_Snapshot__73_parse_netscape_urls.pyabx_plugins/plugins/parse_jsonl_urls/on_Snapshot__74_parse_jsonl_urls.py
These currently parse their direct input source, not sibling generated outputs, so moving them to AfterSnapshot is not just a scheduling change. They must either:
- stay on
Snapshotas direct-source parsers
or:
- be rewritten to scan generated snapshot outputs first, then moved to
AfterSnapshot
abx_plugins/plugins/twocaptcha/on_CrawlSetup__95_twocaptcha_config.jsabx_plugins/plugins/claudechrome/on_CrawlSetup__96_claudechrome_config.js
Use:
CrawlChromeExtensionsReadyin crawl isolationSnapshotChromeExtensionsReadyin snapshot isolation via thin wrappers if needed
Do not use .configured marker files as orchestration state.
abx_plugins/plugins/redirects/prenav.jsonCRAWL_DIR/chrome/.twocaptcha_configuredCRAWL_DIR/chrome/.claudechrome_configuredSNAP_DIR/chrome/url.txt
These may remain as optional debug/manual-run artifacts, but must not be required for orchestration:
CRAWL_DIR/chrome/cdp_url.txtCRAWL_DIR/chrome/chrome.pidCRAWL_DIR/chrome/extensions.jsonSNAP_DIR/chrome/cdp_url.txtSNAP_DIR/chrome/target_id.txtSNAP_DIR/chrome/navigation.jsonSNAP_DIR/chrome/extensions.json
Replace with event payload fields plus reduced env state on:
CrawlChromeBrowserReadyCrawlChromeExtensionsReadySnapshotChromeBrowserReadySnapshotChromeExtensionsReadySnapshotChromeTabReadySnapshotChromeTabNavigatedSnapshotChromeTabNavigationFailed
abx_plugins/plugins/consolelog/console.jsonlabx_plugins/plugins/headers/headers.jsonabx_plugins/plugins/responses/index.jsonlabx_plugins/plugins/dns/dns.jsonl
These remain useful outputs, but file existence must not mean “ready”.
Remove uses of sibling stdout.log scraping such as hasStaticFileOutput(...).
Replace with a real fact like:
SnapshotChromeStaticFileHandled
and/or reduced state keys like:
STATICFILE_HANDLEDSTATICFILE_PATH
Add one small generic helper on both sides:
emit_event_record("EventName", {...})
That helper should support:
- normal payload fields
- optional reserved
envpatch
Shrink or split helpers that currently encode file-based orchestration:
waitForChromeSessionState(...)connectToPage(...)waitForNavigationComplete(...)
Helpers should remain useful for direct/manual invocation, but they should stop being the control plane.
abx_plugins/plugins/staticfile/on_Snapshot__26_staticfile.daemon.bg.js should stop polling responses/index.jsonl.
This is likely a deeper refactor than a simple resubscribe:
- today
staticfilestarts in the pre-navigation stage but later waits forresponsesoutput - in the new model it should either be split into a pre-navigation detector and a later saver, or otherwise rewritten so it consumes a real later event cleanly
If responses needs to fan out to staticfile, use a real event such as:
SnapshotChromeMainResponseSaved
abx_plugins/plugins/singlefile/tests/test_singlefile.pyabx_plugins/plugins/claudechrome/tests/test_claudechrome.pyabx_plugins/plugins/claudecodecleanup/tests/test_claudecodecleanup.py
abx_plugins/plugins/redirects/tests/test_redirects.pyabx_plugins/plugins/headers/tests/test_headers.pyabx_plugins/plugins/consolelog/tests/test_consolelog.pyabx_plugins/plugins/staticfile/tests/test_staticfile.pyabx_plugins/plugins/chrome/tests/chrome_test_helpers.pyabx_plugins/plugins/chrome/tests/test_chrome.py
Prefer asserting:
- emitted events
- reduced
derived.envstate where useful - resulting outputs
- final user-visible behavior
abx_plugins/plugins/chrome/README.md
Remove “authoritative marker file” language for:
cdp_url.txttarget_id.txtnavigation.json
Document:
- event-driven control plane
- synthetic
After<Event>settle barriers - reduced context mirrored to
derived.env - files as outputs/debug aids only
- Update
abx-dlfor dynamicCrawl*/Snapshot*dispatch and syntheticAfter<Event>emission. - Add scoped context reduction and
derived.envmirroring. - Pass full reduced context via env vars and current event payload via CLI args.
- Add generic event emit helpers in JS and Python.
- Expand fallback
ArchiveResultsynthesis soon_AfterSnapshot__...hooks behave likeon_Snapshot__...hooks. - Delete
chrome_waitbarrier hooks. - Move Chrome launch/tab onto real event emission.
- Define the bg pre-navigation settle semantics needed before moving
chrome_navigatetoAfterSnapshotChromeTabReady. - Move post-navigation extractors to
SnapshotChromeTabNavigated. - Move clearly artifact-consuming late hooks to
AfterSnapshot. - Rewrite or keep direct-source URL parsers based on their actual intended inputs.
- Remove file markers and sibling log polling.
- Rewrite tests and docs around event semantics and
derived.env.
The final design is:
- one hook syntax:
on_<Event>__... - hooks emit real facts on stdout
- hooks may include reserved
envpatches for durable aggregate state abx-dlroutes facts genericallyabx-dlauto-emitsAfter<Event>when an event subtree settlesabx-dlreduces prior scoped events into current key/value contextabx-dlmirrors crawl context toCRAWL_DIR/derived.envand snapshot overlays toSNAP_DIR/derived.envonly- hooks receive reduced context as env vars and current event payload as CLI args
SnapshotCleanupwaits for the full rootSnapshottree
That preserves the self-healing “read fresh current state” behavior from the old file-based system without keeping the fractured marker-file protocol.
- make root hook-driving lifecycle events use stable string names like
CrawlSetupandSnapshot - dispatch crawl-tree and snapshot-tree hooks dynamically by event string
- route hook-emitted facts generically onto
abxbus, including syntheticAfter<Event>barriers - reduce prior scoped events into current crawl/snapshot context
- mirror crawl context to
CRAWL_DIR/derived.env - mirror per-snapshot overlays to
SNAP_DIR/derived.envwithout writing snapshot state back to crawl scope - launch hooks with full reduced context as env vars and current event payload as CLI args
- expand fallback
ArchiveResultsynthesis soon_AfterSnapshot__...hooks work the same ason_Snapshot__...hooks - keep
SnapshotCleanupwaiting for the full rootSnapshottree
- add generic
emit_event_record(...)helpers in JS and Python with reservedenvpatch support - migrate Chrome launch/tab/navigate to emit real events plus durable env state
- move pre-navigation listeners onto
SnapshotChromeTabReady - move post-navigation extractors onto
SnapshotChromeTabNavigated - move clearly artifact-consuming late hooks onto
AfterSnapshot - review direct-source parsers before moving them late
- remove marker-file protocols like
prenav.jsonand.configured - remove sibling
stdout.logscraping and replace it with real facts / reduced env state - stop using file existence as readiness
- rewrite tests and docs around emitted events and
derived.env
- consume
UrlDiscoveredinstead of relying on hooks emitting childSnapshotrecords directly - decide crawl policy for materializing new snapshots from
UrlDiscovered - ensure ArchiveBox’s parallel snapshot execution continues to treat crawl context as upstream-only and snapshot context as local overlay
- update any integration tests or docs that assume marker-file-driven Chrome coordination or old
Snapshotdiscovery records