Skip to content

fix: AST fallback escapes, diacritical mark bypass, sh v3.13.0 upgrade#80

Merged
cyyever merged 9 commits intomainfrom
more_fix4
Mar 12, 2026
Merged

fix: AST fallback escapes, diacritical mark bypass, sh v3.13.0 upgrade#80
cyyever merged 9 commits intomainfrom
more_fix4

Conversation

@cyyever
Copy link
Collaborator

@cyyever cyyever commented Mar 12, 2026

Summary

  • Fix backslash escape handling in the AST fallback path (used when nodeHasUnsafe triggers)
  • Strip diacritical marks (combining marks) to prevent regex character class bypass
  • Upgrade mvdan.cc/sh/v3 v3.12.0 → v3.13.0 to fix printf goroutine panic at root cause
  • Add comprehensive SSE buffer and MCP HTTP handler test coverage
  • Fix CI fuzz corpus upload for pre-compiled test binaries

Changes

Dependency upgrade

  • mvdan.cc/sh/v3 v3.12.0 → v3.13.0: fixes expand.formatInto index-out-of-range panic on malformed printf format strings (e.g. printf '%\') in pipe goroutines where defer/recover cannot catch
  • Apply API rename syntax.ClbOutsyntax.RdrClob and add new redirect operators (AppClob, RdrAllClob, AppAllClob) to exhaustive switches

Unicode normalization fix

  • Add stripDiacritics() to NormalizeUnicode pipeline: NFD decompose → strip combining marks (category Mn) → NFC recompose
  • Prevents E + U+0301 (combining acute) from composing into É which bypasses ASCII regex classes like [er]
  • Root cause fix for FuzzCommandRegexBypass failure: CRONTAB -É was not blocked

Extractor fixes

  • wordToLiteral() now dispatches to correct unescape function per quote context
  • Add unescapeDblQuotedLit() for correct bash double-quote escape rules
  • Add unescapeDollarSglQuoted() for C-style $'...' escape sequences
  • Replace differential fuzzer with production-path FuzzASTFallbackExtraction

Test coverage

  • httpproxy: 61.1% → 81.3% (+20pp)
    • FlushModified remove/replace mode for Anthropic, OpenAI, OpenAI Responses APIs
    • Warning injection, findShellTool, marshalJSON, SSE framing helpers
  • mcpgateway: 66.7% → 83.9% (+17pp)
    • handleGet (SSE notification stream with DLP blocking)
    • handleDelete (session removal)
    • copyMCPHeaders, proxyJSONResponse, proxySSEResponse
  • rules: New FuzzASTFallbackExtraction (17 seeds, production path)
    • TestASTFallbackEscapes (12 e2e escape cases)
    • TestASTFallbackUnsafeTriggers (7 trigger type cases)
    • Fuzz corpus entries for combining mark bypass and printf panic

CI fix

  • Add testdata/fuzz/ to artifact upload paths (pre-compiled binaries write corpus to repo root)

Test plan

  • All 30 packages pass (go test -short ./...)
  • FuzzCommandRegexBypass combining mark corpus entry passes
  • FuzzPipeBypass printf panic corpus entry passes
  • All 3 fuzz corpus failures (regex bypass, pipe bypass, fork bomb) fixed
  • All 13 pre-commit hooks pass

cyyever added 4 commits March 12, 2026 11:03
wordToLiteral() read Lit.Value verbatim from the shell AST, which
includes raw backslash escapes (e.g. \\ for literal \). The interpreter
path processes these escapes, but the AST fallback path (used when
nodeHasUnsafe triggers, e.g. for U+FFFD literals) did not.

Add unescapeShellLit() to process \X → X in Lit values, matching
interpreter behavior. Fixes FuzzPipelineExtraction bypass where
"cat /etc/�\\" extracted path "/etc/�\" instead of "/etc/�".
wordToLiteral incorrectly applied unescapeShellLit to double-quoted
literals, stripping backslashes that bash preserves (e.g. "\0" → "0"
instead of "\0"). Add unescapeDblQuotedLit with correct bash double-
quote escape rules: only \$ \` \" \\ \newline are escapes.

Also add unescapeDollarSglQuoted for $'...' strings which support
C-style escapes (\n \t \\ etc.) that wordToLiteral previously ignored.

Add e2e tests for escape handling and unsafe trigger paths, plus
FuzzASTFallbackExtraction targeting the production AST fallback
code path.
Pre-compiled fuzz binaries write failing corpus to testdata/fuzz/
relative to the working directory (repo root), not to
internal/*/testdata/fuzz/. Add the root-level path so the artifact
upload captures these entries on failure.
httpproxy (61.1% → 81.3%):
- FlushModified remove/replace mode for Anthropic, OpenAI, Responses APIs
- findShellTool priority order and fallback
- marshalJSON HTML escape preservation
- writeSSEEvent/writeSSEData/writeRaw framing
- No-tool-use and allowed-tool-call passthrough

mcpgateway (66.7% → 83.9%):
- handleGet SSE stream with DLP blocking
- handleDelete session removal and upstream error
- copyMCPHeaders propagation
- proxyJSONResponse DLP block and invalid JSON passthrough
- proxySSEResponse DLP block on streaming events
@cyyever cyyever changed the title fix(extractor): unescape backslash literals in AST fallback path fix(extractor): AST fallback escape handling + proxy test coverage Mar 12, 2026
NFD decompose → strip combining marks (Mn) → NFC recompose before
confusable mapping. Prevents E+U+0301 (combining acute) from composing
into É which bypasses ASCII regex character classes like [er].
@cyyever cyyever changed the title fix(extractor): AST fallback escape handling + proxy test coverage fix: AST fallback escapes, diacritical mark bypass, proxy test coverage Mar 12, 2026
cyyever added 2 commits March 12, 2026 16:06
mvdan.cc/sh/v3 expand.formatInto panics on malformed printf format
strings (e.g. printf '%\') inside pipe goroutines where our
defer/recover cannot catch it. Replace printf with "true" in
CallHandler — its output is not useful for path extraction.

Also add fuzz corpus entries for FuzzPipeBypass and
FuzzCommandRegexBypass to prevent regression.
v3.13.0 adds bounds checking in expand.formatInto for trailing
backslash in format strings (e.g. printf '%\'). This was the root
cause of the FuzzPipeBypass goroutine panic — the fix in the upstream
library makes our printf→true workaround unnecessary.

Also applies API renames (ClbOut → RdrClob) and adds new redirect
operators (AppClob, RdrAllClob, AppAllClob) to exhaustive switches.
@cyyever cyyever changed the title fix: AST fallback escapes, diacritical mark bypass, proxy test coverage fix: AST fallback escapes, diacritical mark bypass, sh v3.13.0 upgrade Mar 12, 2026
cyyever added 2 commits March 12, 2026 17:00
The shell parser can produce FuncDecl with nil Name for inputs like
"()00". Both astForkBomb and its fuzz oracle accessed fd.Name.Value
without a nil guard, causing a panic.
mvdan.cc/sh/v3 parser panics on edge-case inputs like "export A0=$0("
(slice bounds out of range in declClause). Wrap all parser.Parse calls
with safeShellParse which uses defer/recover to convert panics into
errors. This hardens against any future upstream parser bugs.
@cyyever cyyever merged commit 5f4835e into main Mar 12, 2026
13 of 14 checks passed
@cyyever cyyever deleted the more_fix4 branch March 12, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant