Skip to content

fix: handle Discord gateway metadata fetch failures#44397

Merged
jalehman merged 3 commits intoopenclaw:mainfrom
jalehman:fix/discord-gateway-metadata-failures
Mar 13, 2026
Merged

fix: handle Discord gateway metadata fetch failures#44397
jalehman merged 3 commits intoopenclaw:mainfrom
jalehman:fix/discord-gateway-metadata-failures

Conversation

@jalehman
Copy link
Contributor

What

This PR hardens Discord gateway startup so /gateway/bot failures no longer crash the entire launchd-managed gateway when Discord returns a plain-text upstream 5xx response instead of JSON.

Why

A real production restart loop was traced to GatewayPlugin.registerClient trying to parse a plain-text Discord upstream failure body as JSON, which surfaced as an unhandled rejection and exited the whole process. Discord should retry its own startup path without taking down the full gateway.

Fixes #37375.

Changes

  • Safely parse Discord gateway metadata
  • Treat upstream 5xx bodies as transient
  • Cover proxied and direct startup paths
  • Add regression tests for plain-text 503

Testing

  • pnpm test -- src/discord/monitor/provider.proxy.test.ts src/infra/unhandled-rejections.test.ts src/infra/unhandled-rejections.fatal-detection.test.ts
  • Restarted local launchd-managed gateway built from this repo
  • Verified no post-restart fatal Discord metadata crash in logs

@openclaw-barnacle openclaw-barnacle bot added channel: discord Channel integration: discord size: M maintainer Maintainer-authored PR labels Mar 12, 2026
@aisle-research-bot
Copy link

aisle-research-bot bot commented Mar 12, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🔵 Low Terminal/log injection via unsanitized Discord /gateway/bot response body snippets

1. 🔵 Terminal/log injection via unsanitized Discord /gateway/bot response body snippets

Property Value
Severity Low
CWE CWE-117
Location src/discord/monitor/gateway-plugin.ts:109-116

Description

The Discord gateway metadata fetch now includes a trimmed snippet of the upstream HTTP response body in thrown error messages. Because the snippet is derived from untrusted input (especially when a user-configured proxy is enabled) and is only whitespace-normalized, it can still contain ANSI escape sequences / control characters.

This can lead to log/terminal injection when these errors are surfaced:

  • Input: body = await response.text() reads arbitrary text from the network/proxy.
  • Transformation: summarizeGatewayResponseBody() collapses \s+ but does not strip non-whitespace control bytes like ESC (\x1b) or other non-printables.
  • Sink: the snippet is interpolated into Error.message and may be printed via console-based runtime logging and global error handlers (e.g., unhandled rejection / uncaught exception handlers print formatted error text to stderr).

Vulnerable code (untrusted body included in Error.message):

const summary = summarizeGatewayResponseBody(body);

if (!response.ok) {
  throw createGatewayMetadataError({
    detail: `Discord API /gateway/bot failed (${response.status}): ${summary}`,
    transient,
  });
}

Even with replace(/\s+/g, " "), an attacker-controlled upstream/proxy can include sequences like \x1b[2J (clear screen), OSC-8 hyperlinks, etc., which are not whitespace and would survive into logs/terminal output.

Recommendation

Avoid embedding raw upstream bodies in exception messages that may be logged to a terminal.

Options (prefer 1 + 2):

  1. Strip ANSI escapes and control chars from the snippet before including it in any log/error message.
  2. Keep body snippets out of Error.message; instead attach sanitized details as structured metadata only when verbose/debug logging is enabled.

Example sanitization:

function sanitizeForTerminal(text: string): string {// Remove C0/C1 control chars including ESC (0x1b)
  return text.replace(/[\x00-\x1F\x7F-\x9F]/g, "");
}

function summarizeGatewayResponseBody(body: string): string {
  const normalized = sanitizeForTerminal(body).trim().replace(/\s+/g, " ");
  return normalized ? normalized.slice(0, 240) : "<empty>";
}

If you need the raw body for troubleshooting, log it only under explicit debug/verbose mode and ensure it is escaped/sanitized before printing.


Analyzed PR: #44397 at commit edd17c0

Last updated on: 2026-03-13T05:08:07Z

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR hardens Discord gateway startup by replacing the raw response.json() call with a text-first approach that classifies 5xx and known upstream-proxy error bodies as transient before attempting JSON parsing, and refactors the duplicated ProxyGatewayPlugin class into a shared createGatewayPlugin factory.

Key changes:

  • fetchDiscordGatewayInfo reads the response body as text, checks isTransientDiscordGatewayResponse (5xx or known upstream-connect patterns) before attempting JSON.parse, and wraps all failure paths in createGatewayMetadataError with a consistent "fetch failed" message for transient errors
  • createGatewayPlugin replaces the duplicated ProxyGatewayPlugin/GatewayPlugin instantiation across three branches with a single SafeGatewayPlugin subclass
  • "upstream connect error" and "disconnect/reset before headers" added to TRANSIENT_NETWORK_MESSAGE_SNIPPETS in unhandled-rejections.ts as belt-and-suspenders coverage
  • Tests updated for the new ok/status/text() response shape, with two new cases covering the no-proxy path and the plain-text 503 regression

Issues found:

  • response.text() on line 95 of gateway-plugin.ts is not inside a try-catch. A connection drop during body streaming would throw a raw error that bypasses both the outer fetch-call wrapper and the inner JSON.parse wrapper, potentially producing an error that does not match isWrappedFetchFailedMessage and could cause a process exit.
  • The ...params.fetchInit spread on line 85 comes after the headers literal, so any headers key in fetchInit would silently overwrite the Authorization: Bot … entry. Currently safe (only dispatcher is passed), but fragile for future callers.

Confidence Score: 4/5

  • Safe to merge after addressing the unguarded response.text() body-read path; all other logic is correct and well-tested.
  • The fix correctly solves the reported production restart loop and the refactoring is clean with good test coverage. One logic concern — the unguarded response.text() call could allow body-read network errors to escape the wrapping logic — and one fragile header-spread ordering are the only items that lower the score from 5.
  • src/discord/monitor/gateway-plugin.ts — specifically the response.text() call and the fetchInit spread ordering in fetchDiscordGatewayInfo.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/discord/monitor/gateway-plugin.ts
Line: 95

Comment:
**`response.text()` not guarded against body-read failures**

`response.text()` is called outside any try-catch block. If the connection drops while the response body is being streamed, this will throw a raw network error (e.g., `ECONNRESET`) that bypasses both the outer fetch catch (lines 80–93, which only wraps the initial request) and the inner catch (lines 115–121, which only wraps `JSON.parse`). The raw error will propagate up through `registerClient` without being wrapped in a `createGatewayMetadataError`, so it won't carry the `"fetch failed"` message that `isWrappedFetchFailedMessage` relies on. While many such errors would still be caught via their error codes in `TRANSIENT_NETWORK_CODES`, others (e.g., body-length or encoding errors) could slip through and trigger a process exit.

```suggestion
  let body: string;
  try {
    body = await response.text();
  } catch (error) {
    throw createGatewayMetadataError({
      detail: error instanceof Error ? error.message : String(error),
      transient: true,
      cause: error,
    });
  }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/discord/monitor/gateway-plugin.ts
Line: 81-86

Comment:
**`fetchInit` spread silently overwrites `Authorization` header**

Because `...params.fetchInit` is spread after `headers`, any `headers` key in `fetchInit` would silently replace the `Authorization: Bot …` entry, causing a `401` on the next gateway metadata fetch. The current callers only ever pass `{ dispatcher: fetchAgent }` so there is no immediate breakage, but the ordering is fragile.

Putting the per-call headers last (or deep-merging them) would make this safe regardless of what `fetchInit` contains:

```suggestion
    response = await params.fetchImpl(DISCORD_GATEWAY_BOT_URL, {
      ...params.fetchInit,
      headers: {
        ...(params.fetchInit?.headers as Record<string, string> | undefined),
        Authorization: `Bot ${params.token}`,
      },
    });
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: bcc50f7

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 406d2183f4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +92 to +95
throw createGatewayMetadataError({
detail: error instanceof Error ? error.message : String(error),
transient: true,
cause: error,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-network fetch errors from metadata setup

The params.fetchImpl failure path always sets transient: true, which rewrites all startup fetch exceptions to "...: fetch failed"; isTransientNetworkError treats that suffix as non-fatal, so permanent setup errors (for example malformed Discord tokens that make the Authorization header invalid) get suppressed instead of terminating with a configuration/runtime error. In that scenario the gateway can stay up while Discord startup keeps failing and retrying indefinitely.

Useful? React with 👍 / 👎.

@jalehman jalehman force-pushed the fix/discord-gateway-metadata-failures branch from 406d218 to 5e226d4 Compare March 13, 2026 01:46
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e226d4ee5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +92 to +95
throw createGatewayMetadataError({
detail: error instanceof Error ? error.message : String(error),
transient: true,
cause: error,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-network fetch exceptions in metadata fetch

fetchDiscordGatewayInfo currently marks every fetchImpl exception as transient, which rewrites deterministic setup failures (for example, an invalid Authorization header value from a malformed token) into "...: fetch failed"; isTransientNetworkError treats that wrapped suffix as non-fatal, so Discord startup can loop indefinitely instead of surfacing a real configuration/runtime error. Fresh evidence: this commit now sends both non-proxy and invalid-proxy startup through this same catch path, so the masking applies across all Discord gateway startup modes.

Useful? React with 👍 / 👎.

Normalize Discord /gateway/bot startup failures so plain-text upstream 5xx responses do not crash the launchd-managed gateway. Use the same safe fetch path for proxied and non-proxied Discord startup, and expand transient unhandled-rejection detection for the exact upstream-connect error message family.

Regeneration-Prompt: |
  Investigate a launchd-managed OpenClaw gateway restart loop tied to Discord startup. Logs show GatewayPlugin.registerClient calling Discord /gateway/bot and crashing when Discord or an upstream proxy returns a plain-text 503 body like "upstream connect error or disconnect/reset before headers", which then bubbles into an unhandled rejection and exits the whole gateway.
  Keep the change additive and tightly scoped to Discord gateway startup and rejection classification. Preserve existing proxy support, but stop assuming the gateway metadata response is always JSON or HTTP 200. Treat transport and upstream 5xx failures as transient so Discord can retry without killing the entire gateway process. Add regression tests covering both normal metadata fetches and the plain-text 503 body that previously triggered the restart loop.
Address PR review feedback by preserving Authorization when fetch init fields are merged into the Discord gateway metadata request and by treating response body read failures as transient wrapped fetch failures. Add a regression test for body stream errors during metadata fetch.

Regeneration-Prompt: |
  Follow up on a Discord gateway startup hardening PR after bot review comments. The code already normalizes plain-text Discord /gateway/bot failures, but two edge cases remained: spreading fetchInit after headers could let a future caller clobber the Bot Authorization header, and response.text() could throw while streaming the body and bypass the wrapper that marks startup failures as transient.
  Keep the fix narrow to the Discord gateway metadata helper and its tests. Preserve existing behavior for successful fetches and existing proxy support, but make sure both request construction and body-read failures stay inside the same wrapped error path so launchd-managed gateways do not exit on these edge cases.
@jalehman jalehman force-pushed the fix/discord-gateway-metadata-failures branch from 65bae3e to edd17c0 Compare March 13, 2026 04:45
@jalehman jalehman merged commit 6d0939d into openclaw:main Mar 13, 2026
28 of 29 checks passed
@jalehman
Copy link
Contributor Author

Merged via squash.

Thanks @jalehman!

frankekn pushed a commit to jmcte/openclaw that referenced this pull request Mar 13, 2026
Merged via squash.

Prepared head SHA: edd17c0
Co-authored-by: jalehman <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman
dvrshil pushed a commit to JosepLee/openclaw that referenced this pull request Mar 14, 2026
Merged via squash.

Prepared head SHA: edd17c0
Co-authored-by: jalehman <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman
sbezludny pushed a commit to sbezludny/openclaw that referenced this pull request Mar 14, 2026
Merged via squash.

Prepared head SHA: edd17c0
Co-authored-by: jalehman <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman
ecochran76 pushed a commit to ecochran76/openclaw that referenced this pull request Mar 14, 2026
Merged via squash.

Prepared head SHA: edd17c0
Co-authored-by: jalehman <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Discord channel fetch failure crashes gateway (unhandled rejection in GatewayPlugin.registerClient)

1 participant