Skip to content

Gateway: harden custom session-store discovery#44176

Merged
gumadeiras merged 6 commits intoopenclaw:mainfrom
gumadeiras:codex/session-store-discovery-hardening
Mar 12, 2026
Merged

Gateway: harden custom session-store discovery#44176
gumadeiras merged 6 commits intoopenclaw:mainfrom
gumadeiras:codex/session-store-discovery-hardening

Conversation

@gumadeiras
Copy link
Member

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: ACP session discovery only scanned the default state tree in some paths, so retired or disk-only agent stores under a custom templated session.store root could be missed.
  • Why it matters: ACP startup reconciliation, session-id/session-label targeting, and run-id fallback could silently fail to find still-existing sessions on disk.
  • What changed: extracted shared session-store target discovery, preserved actual discovered on-disk store paths, reused that discovery in ACP enumeration and gateway combined-store loading, and added regression coverage plus a docs clarification.
  • What did NOT change (scope boundary): /acp sessions still reads the current bound/requester store rather than acting as a global ACP session browser.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #

User-visible / Behavior Changes

  • ACP startup reconciliation now discovers disk-only ACP session stores under custom per-agent session.store roots.
  • ACP commands that target sessions by session-id or session-label now resolve sessions found in those same custom-root disk stores.
  • Gateway run-id to session-key fallback now finds sessions in disk-only retired/custom-root agent stores as well.

Security Impact (required)

  • New permissions/capabilities? (Yes/No): No
  • Secrets/tokens handling changed? (Yes/No): No
  • New/changed network calls? (Yes/No): No
  • Command/tool execution surface changed? (Yes/No): No
  • Data access scope changed? (Yes/No): Yes
  • If any Yes, explain risk + mitigation:
    Discovery now includes disk-only agents/*/sessions/sessions.json stores under the configured templated session root and the default state root. The scope is still bounded to existing local session-store locations and does not add new write paths or broader filesystem traversal.

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 + pnpm
  • Model/provider: N/A
  • Integration/channel (if any): ACP / gateway session resolution
  • Relevant config (redacted): session.store: <state>/custom-state/agents/{agentId}/sessions/sessions.json

Steps

  1. Configure a templated custom session.store root outside the default state tree.
  2. Leave a retired or disk-only agent sessions.json on disk under that custom root.
  3. Trigger ACP startup reconciliation, session-id/session-label target resolution, or run-id fallback lookup.

Expected

  • Existing ACP sessions in the custom-root retired store are discovered and resolvable.

Actual

  • Before this patch, some paths only scanned the default state tree or a single resolved store and could miss those sessions.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: custom-root retired ACP store discovery, gateway combined-store loading for disk-only agents, run-id fallback lookup through combined-store discovery, and command-layer target resolution reuse.
  • Edge cases checked: discovered store paths preserve actual on-disk locations even when directory names do not round-trip cleanly through agent-id normalization; request-form ACP keys remain cached as request keys (acp:...) after lookup.
  • What you did not verify: full live ACP backend interaction against a real external runtime.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No): Yes
  • Config/env changes? (Yes/No): No
  • Migration needed? (Yes/No): No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR or temporarily move affected ACP sessions back under configured agent stores that are already present in agents.list.
  • Files/config to restore: session-store discovery callers in src/config/sessions/targets.ts, src/gateway/session-utils.ts, and src/gateway/server-session-key.ts.
  • Known bad symptoms reviewers should watch for: session-id/label lookups unexpectedly returning duplicate or missing sessions under custom templated roots.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: Gateway combined-store discovery now loads additional disk-only stores under the configured root, which could surface duplicate logical sessions if multiple stores contain the same canonical key.
    • Mitigation: combined-store merge still canonicalizes keys and prefers the freshest updatedAt, with regression coverage on disk-only agent aggregation.

Copilot AI review requested due to automatic review settings March 12, 2026 15:33
@aisle-research-bot
Copy link

aisle-research-bot bot commented Mar 12, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Blocking synchronous filesystem discovery on gateway request path enables DoS via resource exhaustion
2 🟡 Medium TOCTOU + hardlink bypass in session-store discovery validation (possible arbitrary file read)

1. 🟡 Blocking synchronous filesystem discovery on gateway request path enables DoS via resource exhaustion

Property Value
Severity Medium
CWE CWE-400
Location src/gateway/session-utils.ts:589-605

Description

loadCombinedSessionStoreForGateway() now performs synchronous, unbounded session-store discovery across all agent roots on every call.

This becomes a denial-of-service risk because:

  • Multiple gateway endpoints call loadCombinedSessionStoreForGateway() in request handlers (e.g., sessions.list, sessions.resolve, sessions.usage).
  • The function calls resolveAllAgentSessionStoreTargetsSync(cfg), which does synchronous readdirSync + lstatSync + realpathSync work across agent directories and then loads each discovered sessions.json.
  • An attacker who can trigger these endpoints repeatedly (even if authenticated) can force repeated event-loop blocking work.
  • A local attacker (or any actor able to write to the state directory) can amplify cost by creating thousands of agent subdirectories under an agents root, causing the discovery loop to scale linearly with directory count.

Vulnerable code:

const targets = resolveAllAgentSessionStoreTargetsSync(cfg);
for (const target of targets) {
  const store = loadSessionStore(target.storePath);
  for (const [key, entry] of Object.entries(store)) {// merge
  }
}

Impact:

  • Event-loop blocking (sync FS calls + JSON loads) → increased latency / request timeouts.
  • Resource exhaustion proportional to number/size of stores discovered → process becomes unresponsive.

Recommendation

Avoid synchronous discovery/loading on the request path and add caching + bounds.

Recommended mitigation (example): cache discovered targets for a short TTL and refresh asynchronously.

// module-level cache
let cachedTargets: { targets: SessionStoreTarget[]; expiresAt: number } | null = null;
const TARGETS_TTL_MS = 5_000;

async function getTargetsCached(cfg: OpenClawConfig): Promise<SessionStoreTarget[]> {
  const now = Date.now();
  if (cachedTargets && cachedTargets.expiresAt > now) return cachedTargets.targets;
  const targets = await resolveAllAgentSessionStoreTargets(cfg);
  cachedTargets = { targets, expiresAt: now + TARGETS_TTL_MS };
  return targets;
}

export async function loadCombinedSessionStoreForGatewayAsync(cfg: OpenClawConfig) {
  const targets = await getTargetsCached(cfg);// load stores (ideally async) and merge
}

Additional hardening options:

  • Enforce a maximum number of discovered agent directories/stores (with warning + truncation).
  • Perform discovery once at startup and refresh in the background.
  • Gate broad discovery behind a config flag when running in server mode.
  • Consider moving store loading off the hot path (e.g., incremental index).

2. 🟡 TOCTOU + hardlink bypass in session-store discovery validation (possible arbitrary file read)

Property Value
Severity Medium
CWE CWE-367
Location src/config/sessions/targets.ts:65-74

Description

Session-store discovery attempts to keep discovered sessions.json within an agents/ root by checking lstat() (reject symlinks) and comparing realpath() results. However:

  • TOCTOU race: validation is performed (lstat/realpath/within-root) and then the validated path string is returned. The file is later opened by path (e.g. fs.readFileSync(storePath) in loadSessionStore) without O_NOFOLLOW/identity checks. A local attacker with write access to the discovered sessions.json directory can swap the validated regular file for a symlink after validation but before read, causing the process to read a different file.
  • Hardlink bypass: the validation rejects symlinks but does not reject hardlinks (nlink > 1). realpath() does not “resolve” hardlinks, so a hardlinked sessions.json can point to content from an arbitrary file on the same filesystem. If the OpenClaw process runs with higher privileges than the attacker, this can become an arbitrary file read / information disclosure primitive (the gateway can serve parsed store content to clients).

Vulnerable code (validation step):

const stat = fsSync.lstatSync(storePath);
if (stat.isSymbolicLink() || !stat.isFile()) {
  return undefined;
}
const realStorePath = fsSync.realpathSync(storePath);
const realAgentsRoot = params.realAgentsRoot ?? fsSync.realpathSync(params.agentsRoot);
return isWithinRoot(realStorePath, realAgentsRoot) ? realStorePath : undefined;

Because the validation does not bind the check to the subsequent read, and does not reject hardlinks, it is possible for a discovered store to reference unexpected content.

Recommendation

Bind validation to the actual file opened, and reject hardlinks.

Option A (recommended): when reading sessions.json, open it safely using the existing safe-open utilities (they already support O_NOFOLLOW, hardlink rejection, and identity checks):

import { openVerifiedFileSync } from "../infra/safe-open-sync.js";

const opened = openVerifiedFileSync({
  filePath: storePath,
  rejectPathSymlink: true,
  rejectHardlinks: true,
  maxBytes: /* reasonable max size */,
  allowedType: "file",
});
if (!opened.ok) return undefined;
try {
  const raw = fs.readFileSync(opened.fd, "utf8");// parse JSON...
} finally {
  fs.closeSync(opened.fd);
}

Option B: if discovery must return a string path, then at minimum re-validate immediately before reading (including nlink checks) and consider using fs.openSync with O_NOFOLLOW where available.

Additionally, consider adding a maximum allowed size for sessions.json reads to reduce DoS risk from discovered roots.


Analyzed PR: #44176 at commit 52ebbf5

Last updated on: 2026-03-12T18:19:39Z

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation gateway Gateway runtime commands Command implementations agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels Mar 12, 2026
@rnrnstar2
Copy link

初歩確認

CI 通過後にレビュー予定。変更規模 (+517/-171, 14 files) を考慮し、以下を確認予定:

  • session-store discovery の新しいロジックが既存の挙動と後方互換か
  • 追加されたテストが edge case をカバーしているか

Greptile Review / Aisle 結果も参照します。

@gumadeiras
Copy link
Member Author

初歩確認

CI 通過後にレビュー予定。変更規模 (+517/-171, 14 files) を考慮し、以下を確認予定:

  • session-store discovery の新しいロジックが既存の挙動と後方互換か
  • 追加されたテストが edge case をカバーしているか

Greptile Review / Aisle 結果も参照します。

stop running automated checks and spamming comments/PRs; final warning

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens gateway/ACP session-store discovery so that disk-only or retired agent session stores are still found when session.store is configured with a custom templated root (not just under the default state tree). It centralizes session-store target discovery, preserves real on-disk store paths for discovered stores, and reuses that logic across ACP enumeration and gateway store loading, with added regression tests and a docs clarification.

Changes:

  • Add shared session-store target discovery (resolveAllAgentSessionStoreTargets{Sync}) that merges configured targets with discovered on-disk agents/*/sessions/sessions.json stores.
  • Update gateway combined-store loading and run-id→session-key fallback to use the shared discovery (covering disk-only agents under custom roots).
  • Add regression tests for discovery + run-id fallback, and clarify /acp sessions behavior in docs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/gateway/session-utils.ts Use shared discovery to build the combined gateway session store across configured + discovered agent stores.
src/gateway/session-utils.test.ts Adjust regression test to cover a custom templated session.store root.
src/gateway/server-session-key.ts Switch run-id fallback lookup to the combined store (covers disk-only/retired stores).
src/gateway/server-session-key.test.ts New regression coverage for run-id lookup through disk-only agent stores under custom root.
src/config/sessions/targets.ts New shared discovery/selection helpers for configured + discovered session-store targets.
src/config/sessions/targets.test.ts New tests for discovery behavior across default + custom roots and path preservation.
src/config/sessions/paths.ts Extend resolveStorePath to accept env; add helper to infer agents root from a canonical store path.
src/config/sessions.ts Export the new session target helpers.
src/commands/session-store-targets.ts Refactor to delegate to the shared config helper (removes duplicated logic).
src/commands/session-store-targets.test.ts Update tests to validate delegation instead of duplicating selector logic.
src/agents/session-dirs.ts Extract reusable sync/async helpers for enumerating agent sessions directories from an agents root.
src/acp/runtime/session-meta.ts Use shared discovery for ACP session enumeration (covers custom-root disk stores).
src/acp/runtime/session-meta.test.ts New test validating ACP enumeration uses resolved store targets.
docs/tools/acp-agents.md Clarify /acp sessions vs gateway discovery behavior for session-key/id/label targeting.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR hardens custom session-store discovery across ACP enumeration, gateway combined-store loading, and run-id fallback lookup by extracting a shared resolveAllAgentSessionStoreTargets helper that discovers disk-only and retired agent stores under both the default state tree and any configured custom template root. The refactoring is clean, backward-compatible, and well-tested for the new discovery paths.

Key changes:

  • New src/config/sessions/targets.ts centralizes resolveSessionStoreTargets, resolveAllAgentSessionStoreTargets (async), and resolveAllAgentSessionStoreTargetsSync with proper env propagation.
  • resolveStorePath in paths.ts now threads a caller-supplied env through all home-dir helpers instead of hard-coding process.env.
  • New resolveAgentsDirFromSessionStorePath extracts the agents/ root from a store path so discovery can scan the custom root without re-parsing the template.
  • loadCombinedSessionStoreForGateway and resolveSessionKeyForRun now use the shared discovery, picking up disk-only/retired agents under custom store roots.
  • listAcpSessionEntries (ACP startup reconciliation) similarly delegates to the shared helper.
  • Test gap: src/config/sessions/targets.test.ts covers resolveAllAgentSessionStoreTargets thoroughly, but resolveSessionStoreTargets error/validation cases (unknown agent rejection, conflicting --agent/--all-agents selectors, deduplication of shared-path stores) that previously lived in session-store-targets.test.ts were removed and not replaced anywhere.

Confidence Score: 4/5

  • Safe to merge; the fix is well-scoped and backward-compatible, with the only concern being reduced unit-test coverage for resolveSessionStoreTargets error paths.
  • The core logic is correct: discovery correctly unions configured and disk-scanned targets, deduplicates by path, and preserves actual on-disk paths. All new paths are regression-tested with real filesystem fixtures. The main deduction is the removal of unit tests for resolveSessionStoreTargets error cases (unknown agent, conflicting selectors, deduplication) without adding equivalent tests in the new targets.test.ts.
  • src/commands/session-store-targets.test.ts — error-path tests for resolveSessionStoreTargets were dropped and should be added to src/config/sessions/targets.test.ts.

Comments Outside Diff (1)

  1. src/commands/session-store-targets.test.ts, line 235-272 (link)

    Error-path tests dropped without replacement in targets.test.ts

    The refactor removed four specific test cases that validated important invariants of resolveSessionStoreTargets:

    • "dedupes shared store paths for --all-agents"
    • "rejects unknown agent ids"
    • "rejects conflicting selectors" (two assertions: --agent+--all-agents and --store+--all-agents)
    • "resolves all configured agent stores"

    The replacement test only verifies the delegation call-through. However, src/config/sessions/targets.test.ts exclusively tests resolveAllAgentSessionStoreTargets — it has zero tests for resolveSessionStoreTargets itself. The error-throwing paths (hasAgent && allAgents, opts.store && allAgents/hasAgent, unknown-agent guard) and deduplication logic are now untested. Since those guard clauses are still present in targets.ts, they should be exercised at the new location.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/commands/session-store-targets.test.ts
Line: 235-272

Comment:
**Error-path tests dropped without replacement in `targets.test.ts`**

The refactor removed four specific test cases that validated important invariants of `resolveSessionStoreTargets`:

- "dedupes shared store paths for --all-agents"
- "rejects unknown agent ids"
- "rejects conflicting selectors" (two assertions: `--agent`+`--all-agents` and `--store`+`--all-agents`)
- "resolves all configured agent stores"

The replacement test only verifies the delegation call-through. However, `src/config/sessions/targets.test.ts` exclusively tests `resolveAllAgentSessionStoreTargets` — it has zero tests for `resolveSessionStoreTargets` itself. The error-throwing paths (`hasAgent && allAgents`, `opts.store && allAgents/hasAgent`, unknown-agent guard) and deduplication logic are now untested. Since those guard clauses are still present in `targets.ts`, they should be exercised at the new location.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: e18feb6

@gumadeiras gumadeiras self-assigned this Mar 12, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67be0eabd6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gumadeiras gumadeiras force-pushed the codex/session-store-discovery-hardening branch 3 times, most recently from b99c3a4 to 7d10ab4 Compare March 12, 2026 16:16
@gumadeiras
Copy link
Member Author

Thanks for the review. I checked these concerns against SECURITY.md and also hardened the implementation on top of that.

From a security-triage standpoint, I do not think the reported scenarios are in-scope vulnerabilities under OpenClaw's stated trust model:

  • SECURITY.md explicitly does not treat a shared gateway as a per-user adversarial isolation boundary.
  • Authenticated gateway callers are trusted operators in that model.
  • Reports that depend on writing attacker-controlled files or symlinks under trusted local state such as ~/.openclaw are out of scope.

That applies to the symlinked-store, retired-store visibility, normalized-dir collision, and run-id poisoning scenarios as written in the comment.

That said, the underlying hardening opportunities were still worthwhile, so I pushed defense-in-depth changes in the PR head:

  • discovered sessions.json files are now skipped if they are symlinks
  • discovered stores are validated to remain inside the resolved agent root after realpath
  • discovered directory names that collapse into the implicit default main agent id are skipped unless they actually are main
  • duplicate sessionId run lookups are now deterministic and ambiguity-safe instead of taking the first arbitrary match

This was added in the current PR head commit:
4f2970d6ccaba621240bce3f7f3b427f80e3074b

Targeted verification passed:
pnpm vitest run src/config/sessions/targets.test.ts src/gateway/server-session-key.test.ts src/acp/runtime/session-meta.test.ts src/gateway/session-utils.test.ts

So my conclusion is:

  • the original comment does not establish an in-scope security issue under SECURITY.md
  • the code is now stricter anyway, which removes the main defense-in-depth concerns that were worth addressing

@gumadeiras gumadeiras force-pushed the codex/session-store-discovery-hardening branch from 9bc2184 to 52ebbf5 Compare March 12, 2026 16:44
@gumadeiras gumadeiras merged commit 46f0bfc into openclaw:main Mar 12, 2026
22 of 27 checks passed
@gumadeiras
Copy link
Member Author

Merged via squash.

Thanks @gumadeiras!

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bc2184f77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mukhtharcm pushed a commit to teconomix/openclaw that referenced this pull request Mar 12, 2026
Merged via squash.

Prepared head SHA: 52ebbf5
Co-authored-by: gumadeiras <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras
steipete pushed a commit to BruceMacD/openclaw that referenced this pull request Mar 13, 2026
Merged via squash.

Prepared head SHA: 52ebbf5
Co-authored-by: gumadeiras <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras
dvrshil pushed a commit to JosepLee/openclaw that referenced this pull request Mar 14, 2026
Merged via squash.

Prepared head SHA: 52ebbf5
Co-authored-by: gumadeiras <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras
Interstellar-code pushed a commit to Interstellar-code/operator1 that referenced this pull request Mar 16, 2026
Merged via squash.

Prepared head SHA: 52ebbf5
Co-authored-by: gumadeiras <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras

(cherry picked from commit 46f0bfc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations docs Improvements or additions to documentation gateway Gateway runtime maintainer Maintainer-authored PR size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants