Gateway: harden custom session-store discovery by gumadeiras · Pull Request #44176 · openclaw/openclaw

gumadeiras · 2026-03-12T15:33:45Z

Summary

Describe the problem and fix in 2–5 bullets:

Problem: ACP session discovery only scanned the default state tree in some paths, so retired or disk-only agent stores under a custom templated session.store root could be missed.
Why it matters: ACP startup reconciliation, session-id/session-label targeting, and run-id fallback could silently fail to find still-existing sessions on disk.
What changed: extracted shared session-store target discovery, preserved actual discovered on-disk store paths, reused that discovery in ACP enumeration and gateway combined-store loading, and added regression coverage plus a docs clarification.
What did NOT change (scope boundary): /acp sessions still reads the current bound/requester store rather than acting as a global ACP session browser.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #

User-visible / Behavior Changes

ACP startup reconciliation now discovers disk-only ACP session stores under custom per-agent session.store roots.
ACP commands that target sessions by session-id or session-label now resolve sessions found in those same custom-root disk stores.
Gateway run-id to session-key fallback now finds sessions in disk-only retired/custom-root agent stores as well.

Security Impact (required)

New permissions/capabilities? (Yes/No): No
Secrets/tokens handling changed? (Yes/No): No
New/changed network calls? (Yes/No): No
Command/tool execution surface changed? (Yes/No): No
Data access scope changed? (Yes/No): Yes
If any Yes, explain risk + mitigation:
Discovery now includes disk-only agents/*/sessions/sessions.json stores under the configured templated session root and the default state root. The scope is still bounded to existing local session-store locations and does not add new write paths or broader filesystem traversal.

Repro + Verification

Environment

OS: macOS
Runtime/container: Node 22 + pnpm
Model/provider: N/A
Integration/channel (if any): ACP / gateway session resolution
Relevant config (redacted): session.store: <state>/custom-state/agents/{agentId}/sessions/sessions.json

Steps

Configure a templated custom session.store root outside the default state tree.
Leave a retired or disk-only agent sessions.json on disk under that custom root.
Trigger ACP startup reconciliation, session-id/session-label target resolution, or run-id fallback lookup.

Expected

Existing ACP sessions in the custom-root retired store are discovered and resolvable.

Actual

Before this patch, some paths only scanned the default state tree or a single resolved store and could miss those sessions.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: custom-root retired ACP store discovery, gateway combined-store loading for disk-only agents, run-id fallback lookup through combined-store discovery, and command-layer target resolution reuse.
Edge cases checked: discovered store paths preserve actual on-disk locations even when directory names do not round-trip cleanly through agent-id normalization; request-form ACP keys remain cached as request keys (acp:...) after lookup.
What you did not verify: full live ACP backend interaction against a real external runtime.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No): Yes
Config/env changes? (Yes/No): No
Migration needed? (Yes/No): No
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly: revert this PR or temporarily move affected ACP sessions back under configured agent stores that are already present in agents.list.
Files/config to restore: session-store discovery callers in src/config/sessions/targets.ts, src/gateway/session-utils.ts, and src/gateway/server-session-key.ts.
Known bad symptoms reviewers should watch for: session-id/label lookups unexpectedly returning duplicate or missing sessions under custom templated roots.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk: Gateway combined-store discovery now loads additional disk-only stores under the configured root, which could surface duplicate logical sessions if multiple stores contain the same canonical key.
- Mitigation: combined-store merge still canonicalizes keys and prefers the freshest updatedAt, with regression coverage on disk-only agent aggregation.

aisle-research-bot · 2026-03-12T15:33:53Z

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	Blocking synchronous filesystem discovery on gateway request path enables DoS via resource exhaustion
2	🟡 Medium	TOCTOU + hardlink bypass in session-store discovery validation (possible arbitrary file read)

1. 🟡 Blocking synchronous filesystem discovery on gateway request path enables DoS via resource exhaustion

Property	Value
Severity	Medium
CWE	CWE-400
Location	`src/gateway/session-utils.ts:589-605`

Description

loadCombinedSessionStoreForGateway() now performs synchronous, unbounded session-store discovery across all agent roots on every call.

This becomes a denial-of-service risk because:

Multiple gateway endpoints call loadCombinedSessionStoreForGateway() in request handlers (e.g., sessions.list, sessions.resolve, sessions.usage).
The function calls resolveAllAgentSessionStoreTargetsSync(cfg), which does synchronous readdirSync + lstatSync + realpathSync work across agent directories and then loads each discovered sessions.json.
An attacker who can trigger these endpoints repeatedly (even if authenticated) can force repeated event-loop blocking work.
A local attacker (or any actor able to write to the state directory) can amplify cost by creating thousands of agent subdirectories under an agents root, causing the discovery loop to scale linearly with directory count.

Vulnerable code:

const targets = resolveAllAgentSessionStoreTargetsSync(cfg);
for (const target of targets) {
  const store = loadSessionStore(target.storePath);
  for (const [key, entry] of Object.entries(store)) {
    // merge
  }
}

Impact:

Event-loop blocking (sync FS calls + JSON loads) → increased latency / request timeouts.
Resource exhaustion proportional to number/size of stores discovered → process becomes unresponsive.

Recommendation

Avoid synchronous discovery/loading on the request path and add caching + bounds.

Recommended mitigation (example): cache discovered targets for a short TTL and refresh asynchronously.

// module-level cache
let cachedTargets: { targets: SessionStoreTarget[]; expiresAt: number } | null = null;
const TARGETS_TTL_MS = 5_000;

async function getTargetsCached(cfg: OpenClawConfig): Promise<SessionStoreTarget[]> {
  const now = Date.now();
  if (cachedTargets && cachedTargets.expiresAt > now) return cachedTargets.targets;
  const targets = await resolveAllAgentSessionStoreTargets(cfg);
  cachedTargets = { targets, expiresAt: now + TARGETS_TTL_MS };
  return targets;
}

export async function loadCombinedSessionStoreForGatewayAsync(cfg: OpenClawConfig) {
  const targets = await getTargetsCached(cfg);
  // load stores (ideally async) and merge
}

Additional hardening options:

Enforce a maximum number of discovered agent directories/stores (with warning + truncation).
Perform discovery once at startup and refresh in the background.
Gate broad discovery behind a config flag when running in server mode.
Consider moving store loading off the hot path (e.g., incremental index).

2. 🟡 TOCTOU + hardlink bypass in session-store discovery validation (possible arbitrary file read)

Property	Value
Severity	Medium
CWE	CWE-367
Location	`src/config/sessions/targets.ts:65-74`

Description

Session-store discovery attempts to keep discovered sessions.json within an agents/ root by checking lstat() (reject symlinks) and comparing realpath() results. However:

TOCTOU race: validation is performed (lstat/realpath/within-root) and then the validated path string is returned. The file is later opened by path (e.g. fs.readFileSync(storePath) in loadSessionStore) without O_NOFOLLOW/identity checks. A local attacker with write access to the discovered sessions.json directory can swap the validated regular file for a symlink after validation but before read, causing the process to read a different file.
Hardlink bypass: the validation rejects symlinks but does not reject hardlinks (nlink > 1). realpath() does not “resolve” hardlinks, so a hardlinked sessions.json can point to content from an arbitrary file on the same filesystem. If the OpenClaw process runs with higher privileges than the attacker, this can become an arbitrary file read / information disclosure primitive (the gateway can serve parsed store content to clients).

Vulnerable code (validation step):

const stat = fsSync.lstatSync(storePath);
if (stat.isSymbolicLink() || !stat.isFile()) {
  return undefined;
}
const realStorePath = fsSync.realpathSync(storePath);
const realAgentsRoot = params.realAgentsRoot ?? fsSync.realpathSync(params.agentsRoot);
return isWithinRoot(realStorePath, realAgentsRoot) ? realStorePath : undefined;

Because the validation does not bind the check to the subsequent read, and does not reject hardlinks, it is possible for a discovered store to reference unexpected content.

Recommendation

Bind validation to the actual file opened, and reject hardlinks.

Option A (recommended): when reading sessions.json, open it safely using the existing safe-open utilities (they already support O_NOFOLLOW, hardlink rejection, and identity checks):

import { openVerifiedFileSync } from "../infra/safe-open-sync.js";

const opened = openVerifiedFileSync({
  filePath: storePath,
  rejectPathSymlink: true,
  rejectHardlinks: true,
  maxBytes: /* reasonable max size */,
  allowedType: "file",
});
if (!opened.ok) return undefined;
try {
  const raw = fs.readFileSync(opened.fd, "utf8");
  // parse JSON...
} finally {
  fs.closeSync(opened.fd);
}

Option B: if discovery must return a string path, then at minimum re-validate immediately before reading (including nlink checks) and consider using fs.openSync with O_NOFOLLOW where available.

Additionally, consider adding a maximum allowed size for sessions.json reads to reduce DoS risk from discovered roots.

Analyzed PR: #44176 at commit 52ebbf5

_{Last updated on: 2026-03-12T18:19:39Z}

rnrnstar2 · 2026-03-12T15:36:27Z

初歩確認

CI 通過後にレビュー予定。変更規模 (+517/-171, 14 files) を考慮し、以下を確認予定:

session-store discovery の新しいロジックが既存の挙動と後方互換か
追加されたテストが edge case をカバーしているか

Greptile Review / Aisle 結果も参照します。

gumadeiras · 2026-03-12T15:38:38Z

初歩確認

CI 通過後にレビュー予定。変更規模 (+517/-171, 14 files) を考慮し、以下を確認予定:

session-store discovery の新しいロジックが既存の挙動と後方互換か

追加されたテストが edge case をカバーしているか

Greptile Review / Aisle 結果も参照します。

stop running automated checks and spamming comments/PRs; final warning

Copilot

Pull request overview

This PR hardens gateway/ACP session-store discovery so that disk-only or retired agent session stores are still found when session.store is configured with a custom templated root (not just under the default state tree). It centralizes session-store target discovery, preserves real on-disk store paths for discovered stores, and reuses that logic across ACP enumeration and gateway store loading, with added regression tests and a docs clarification.

Changes:

Add shared session-store target discovery (resolveAllAgentSessionStoreTargets{Sync}) that merges configured targets with discovered on-disk agents/*/sessions/sessions.json stores.
Update gateway combined-store loading and run-id→session-key fallback to use the shared discovery (covering disk-only agents under custom roots).
Add regression tests for discovery + run-id fallback, and clarify /acp sessions behavior in docs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/gateway/session-utils.ts	Use shared discovery to build the combined gateway session store across configured + discovered agent stores.
src/gateway/session-utils.test.ts	Adjust regression test to cover a custom templated `session.store` root.
src/gateway/server-session-key.ts	Switch run-id fallback lookup to the combined store (covers disk-only/retired stores).
src/gateway/server-session-key.test.ts	New regression coverage for run-id lookup through disk-only agent stores under custom root.
src/config/sessions/targets.ts	New shared discovery/selection helpers for configured + discovered session-store targets.
src/config/sessions/targets.test.ts	New tests for discovery behavior across default + custom roots and path preservation.
src/config/sessions/paths.ts	Extend `resolveStorePath` to accept `env`; add helper to infer `agents` root from a canonical store path.
src/config/sessions.ts	Export the new session target helpers.
src/commands/session-store-targets.ts	Refactor to delegate to the shared config helper (removes duplicated logic).
src/commands/session-store-targets.test.ts	Update tests to validate delegation instead of duplicating selector logic.
src/agents/session-dirs.ts	Extract reusable sync/async helpers for enumerating agent `sessions` directories from an `agents` root.
src/acp/runtime/session-meta.ts	Use shared discovery for ACP session enumeration (covers custom-root disk stores).
src/acp/runtime/session-meta.test.ts	New test validating ACP enumeration uses resolved store targets.
docs/tools/acp-agents.md	Clarify `/acp sessions` vs gateway discovery behavior for session-key/id/label targeting.

src/gateway/server-session-key.ts

src/config/sessions/targets.ts

greptile-apps · 2026-03-12T15:39:10Z

Greptile Summary

This PR hardens custom session-store discovery across ACP enumeration, gateway combined-store loading, and run-id fallback lookup by extracting a shared resolveAllAgentSessionStoreTargets helper that discovers disk-only and retired agent stores under both the default state tree and any configured custom template root. The refactoring is clean, backward-compatible, and well-tested for the new discovery paths.

Key changes:

New src/config/sessions/targets.ts centralizes resolveSessionStoreTargets, resolveAllAgentSessionStoreTargets (async), and resolveAllAgentSessionStoreTargetsSync with proper env propagation.
resolveStorePath in paths.ts now threads a caller-supplied env through all home-dir helpers instead of hard-coding process.env.
New resolveAgentsDirFromSessionStorePath extracts the agents/ root from a store path so discovery can scan the custom root without re-parsing the template.
loadCombinedSessionStoreForGateway and resolveSessionKeyForRun now use the shared discovery, picking up disk-only/retired agents under custom store roots.
listAcpSessionEntries (ACP startup reconciliation) similarly delegates to the shared helper.
Test gap: src/config/sessions/targets.test.ts covers resolveAllAgentSessionStoreTargets thoroughly, but resolveSessionStoreTargets error/validation cases (unknown agent rejection, conflicting --agent/--all-agents selectors, deduplication of shared-path stores) that previously lived in session-store-targets.test.ts were removed and not replaced anywhere.

Confidence Score: 4/5

Safe to merge; the fix is well-scoped and backward-compatible, with the only concern being reduced unit-test coverage for resolveSessionStoreTargets error paths.
The core logic is correct: discovery correctly unions configured and disk-scanned targets, deduplicates by path, and preserves actual on-disk paths. All new paths are regression-tested with real filesystem fixtures. The main deduction is the removal of unit tests for resolveSessionStoreTargets error cases (unknown agent, conflicting selectors, deduplication) without adding equivalent tests in the new targets.test.ts.
src/commands/session-store-targets.test.ts — error-path tests for resolveSessionStoreTargets were dropped and should be added to src/config/sessions/targets.test.ts.

Comments Outside Diff (1)

src/commands/session-store-targets.test.ts, line 235-272 (link)

Error-path tests dropped without replacement in targets.test.ts

The refactor removed four specific test cases that validated important invariants of resolveSessionStoreTargets:
- "dedupes shared store paths for --all-agents"
- "rejects unknown agent ids"
- "rejects conflicting selectors" (two assertions: --agent+--all-agents and --store+--all-agents)
- "resolves all configured agent stores"
The replacement test only verifies the delegation call-through. However, src/config/sessions/targets.test.ts exclusively tests resolveAllAgentSessionStoreTargets — it has zero tests for resolveSessionStoreTargets itself. The error-throwing paths (hasAgent && allAgents, opts.store && allAgents/hasAgent, unknown-agent guard) and deduplication logic are now untested. Since those guard clauses are still present in targets.ts, they should be exercised at the new location.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/commands/session-store-targets.test.ts
Line: 235-272

Comment:
**Error-path tests dropped without replacement in `targets.test.ts`**

The refactor removed four specific test cases that validated important invariants of `resolveSessionStoreTargets`:

- "dedupes shared store paths for --all-agents"
- "rejects unknown agent ids"
- "rejects conflicting selectors" (two assertions: `--agent`+`--all-agents` and `--store`+`--all-agents`)
- "resolves all configured agent stores"

The replacement test only verifies the delegation call-through. However, `src/config/sessions/targets.test.ts` exclusively tests `resolveAllAgentSessionStoreTargets` — it has zero tests for `resolveSessionStoreTargets` itself. The error-throwing paths (`hasAgent && allAgents`, `opts.store && allAgents/hasAgent`, unknown-agent guard) and deduplication logic are now untested. Since those guard clauses are still present in `targets.ts`, they should be exercised at the new location.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: e18feb6}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67be0eabd6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/agents/session-dirs.ts

src/gateway/server-session-key.ts

gumadeiras · 2026-03-12T16:28:58Z

Thanks for the review. I checked these concerns against SECURITY.md and also hardened the implementation on top of that.

From a security-triage standpoint, I do not think the reported scenarios are in-scope vulnerabilities under OpenClaw's stated trust model:

SECURITY.md explicitly does not treat a shared gateway as a per-user adversarial isolation boundary.
Authenticated gateway callers are trusted operators in that model.
Reports that depend on writing attacker-controlled files or symlinks under trusted local state such as ~/.openclaw are out of scope.

That applies to the symlinked-store, retired-store visibility, normalized-dir collision, and run-id poisoning scenarios as written in the comment.

That said, the underlying hardening opportunities were still worthwhile, so I pushed defense-in-depth changes in the PR head:

discovered sessions.json files are now skipped if they are symlinks
discovered stores are validated to remain inside the resolved agent root after realpath
discovered directory names that collapse into the implicit default main agent id are skipped unless they actually are main
duplicate sessionId run lookups are now deterministic and ambiguity-safe instead of taking the first arbitrary match

This was added in the current PR head commit:
4f2970d6ccaba621240bce3f7f3b427f80e3074b

Targeted verification passed:
pnpm vitest run src/config/sessions/targets.test.ts src/gateway/server-session-key.test.ts src/acp/runtime/session-meta.test.ts src/gateway/session-utils.test.ts

So my conclusion is:

the original comment does not establish an in-scope security issue under SECURITY.md
the code is now stricter anyway, which removes the main defense-in-depth concerns that were worth addressing

gumadeiras · 2026-03-12T16:44:56Z

Merged via squash.

Prepared head SHA: 52ebbf5188b47386f2a78ac4715993bc082e911b
Merge commit: 46f0bfc55b587eaa276a59bbdb3fbceef445f0bf

Thanks @gumadeiras!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bc2184f77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/config/sessions/targets.ts

@gumadeiras

Merged via squash. Prepared head SHA: 52ebbf5 Co-authored-by: gumadeiras <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras

@gumadeiras

Merged via squash. Prepared head SHA: 52ebbf5 Co-authored-by: gumadeiras <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras

@gumadeiras

Merged via squash. Prepared head SHA: 52ebbf5 Co-authored-by: gumadeiras <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras

@gumadeiras

Merged via squash. Prepared head SHA: 52ebbf5 Co-authored-by: gumadeiras <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras (cherry picked from commit 46f0bfc)

Copilot AI review requested due to automatic review settings March 12, 2026 15:33

openclaw-barnacle bot added docs Improvements or additions to documentation gateway Gateway runtime commands Command implementations agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels Mar 12, 2026

Copilot started reviewing on behalf of gumadeiras March 12, 2026 15:34 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

src/gateway/server-session-key.ts Outdated Show resolved Hide resolved

src/config/sessions/targets.ts Show resolved Hide resolved

gumadeiras self-assigned this Mar 12, 2026

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/agents/session-dirs.ts Show resolved Hide resolved

src/gateway/server-session-key.ts Show resolved Hide resolved

gumadeiras force-pushed the codex/session-store-discovery-hardening branch 3 times, most recently from b99c3a4 to 7d10ab4 Compare March 12, 2026 16:16

openclaw-barnacle bot added size: XL and removed size: L labels Mar 12, 2026

gumadeiras added 6 commits March 12, 2026 16:43

Gateway: harden custom session-store discovery

0d17bab

Tests: restore session target coverage

7a3fdff

Gateway: harden session discovery fallbacks

d0307ac

docs: add changelog entry for session discovery fix

2f119e0

fix: harden discovered session store loading

7c0d8c9

refactor: share session id resolution logic

52ebbf5

gumadeiras force-pushed the codex/session-store-discovery-hardening branch from 9bc2184 to 52ebbf5 Compare March 12, 2026 16:44

gumadeiras merged commit 46f0bfc into openclaw:main Mar 12, 2026
22 of 27 checks passed

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

src/config/sessions/targets.ts Show resolved Hide resolved

github-actions bot mentioned this pull request Mar 12, 2026

📡 Upstream Digest — 2026-03-12 16:52 UTC curtismercier/openclaw-mods#243

Open

github-actions bot mentioned this pull request Mar 13, 2026

上游更新: v2026.3.12 — 14 P0 + 33 P1 待合并 jiulingyun/openclaw-cn#506

Open

Uh oh!

Conversation

gumadeiras commented Mar 12, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

aisle-research-bot bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

1. 🟡 Blocking synchronous filesystem discovery on gateway request path enables DoS via resource exhaustion

Description

Recommendation

2. 🟡 TOCTOU + hardlink bypass in session-store discovery validation (possible arbitrary file read)

Description

Recommendation

Uh oh!

rnrnstar2 commented Mar 12, 2026

初歩確認

Uh oh!

gumadeiras commented Mar 12, 2026

初歩確認

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (1)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

gumadeiras commented Mar 12, 2026

Uh oh!

Uh oh!

gumadeiras commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aisle-research-bot bot commented Mar 12, 2026 •

edited

Loading

greptile-apps bot commented Mar 12, 2026 •

edited

Loading