Skip to content

feat: Improve API key scanner, add more providers#9

Merged
Pringled merged 17 commits intomainfrom
improve-api-keys
Mar 7, 2026
Merged

feat: Improve API key scanner, add more providers#9
Pringled merged 17 commits intomainfrom
improve-api-keys

Conversation

@Pringled
Copy link
Owner

@Pringled Pringled commented Mar 7, 2026

No description provided.

Pringled added 15 commits March 7, 2026 10:12
- Add nameRegexPatterns (35 patterns: 28 provider keywords + 7 generic credential terms)
- Add valuePatterns (10 entries covering OpenAI, HuggingFace, GitHub token formats)
- Implement scanNameRegex(): flags env vars with provider/credential names not in HighRiskEnvKeys
- Implement scanValuePatterns(): flags env vars whose values match known provider key formats
- Wire both methods into Scan() after scanEnvKeys()
- Values are read only for emptiness check (scanNameRegex) or prefix+length match (scanValuePatterns); never stored in findings
Name-regex tests (TestAPIKeyScanner_NameRegex_*):
- ProviderKeyword: MY_OPENAI_KEY flagged by provider keyword match
- GenericTerm: INTERNAL_API_KEY flagged by generic credential term
- NoDuplicateWithBuiltin: OPENAI_API_KEY produces exactly 1 finding
- EmptyValueNotFlagged: empty-value vars not reported
- ValueNotInFindings: secret value never appears in any finding field

Value-pattern tests (TestAPIKeyScanner_ValuePattern_*):
- OpenAIProject: sk-proj- prefix + 48 chars detected with provider tag
- HuggingFace: hf_ prefix + 34 chars detected with HuggingFace tag
- GitHub_ClassicPAT: ghp_ prefix + 36 chars detected with GitHub tag
- NoMatchWrongLength: correct prefix but wrong length not flagged
- BuiltinSkipped: HighRiskEnvKeys key produces exactly 1 finding
…itive

- Add 9 new value patterns: Anthropic (sk-ant-), Stripe live/test secrets
  (sk_live_/sk_test_), Stripe restricted (rk_live_/rk_test_), GitLab PAT
  (glpat-), npm granular token (npm_), Groq (gsk_), SendGrid (SG.)
- Demote generic sk- (51 chars) to SeverityUncertain — shared by many tools
- Add per-pattern severity field to valuePattern struct; scanValuePatterns
  uses it instead of always emitting SeverityHigh
- Reorder valuePatterns so sk-proj- and sk-admin- match before sk-
- Expand nameRegexPatterns with 17 new entries: GEMINI, VERTEX, PALM,
  BEDROCK, AZURE_OPENAI, AZURE_COGNITIVE, RESEND, POSTMARK, SPARKPOST,
  LINEAR, NOTION, AIRTABLE, SUPABASE, NEON, PLANETSCALE
- Fix overbroad FLY pattern → \bFLY_ (word boundary) to avoid false
  positives on BUTTERFLY_KEY and similar variable names
- Fix misleading comment on valuePatterns var (was 'compiled at package
  init'; it is a plain struct slice, nothing is compiled)
- Add 10 new tests covering all new patterns and the FLY boundary fix
…w providers

Cross-pass duplicate findings (critical):
- Introduce shared seenEnvNames map passed into scanNameRegex and
  scanValuePatterns; a variable matching both passes now produces exactly
  one finding (name-regex wins, value-pattern is skipped)
- Add regression test: CUSTOM_STRIPE_KEY=sk_live_... yields 1 finding

Tighten overbroad name regexes (false-positive fix):
- NEON  → \bNEON_  (avoids ANEMONE_CONFIG, NEONLIGHTS_COLOR)
- LINEAR → \bLINEAR_ (avoids BILINEAR_FILTER)
- PALM  → \bPALM_  (avoids NAPALM_MODE, PALM_BEACH_PROPERTY)
- Add false-positive regression tests for all three

Add missing AI provider name patterns:
- OPENROUTER, FIREWORKS, DEEPSEEK, PERPLEXITY, CEREBRAS, DOPPLER
- Covered by TestAPIKeyScanner_NameRegex_NewAIProviders

Add Twilio SK value pattern:
- SK + 32 hex chars = 34 total → SeverityHigh
- Covered by TestAPIKeyScanner_ValuePattern_TwilioSID

Fix top-level APIKeyScanner struct comment:
- Clarifies that values are read transiently in scanValuePatterns for
  pattern matching only, never emitted or stored
…EAR_ regex

- Restore 14 value-pattern and name-regex tests accidentally deleted in 124de66
  (recovered from db20f03 and merged with tests added in HEAD)
- Downgrade Twilio SK prefix from SeverityHigh to SeverityUncertain: the bare
  'SK' prefix is too broad (no hex charset validation), so false positives are
  likely; test updated to assert UNCERTAIN
- Fix LINEAR_ name-regex: replace \bLINEAR_ with (^|_)LINEAR_ so that
  MY_LINEAR_TOKEN matches (underscore is a word char in RE2, so \b fails there)
  while BILINEAR_FILTER still does not match
…/AI21/NVIDIA_NIM

- Fix cross-pass dedup gap: scanEnvKeys now accepts and populates the shared
  seenEnvNames map, so extra_env_keys entries that also match a nameRegex pattern
  produce exactly one finding (scanEnvKeys wins as highest-priority pass).
  Regression test: TestAPIKeyScanner_ExtraEnvKeys_NoDuplicateWithNameRegex.

- Fix FLY_, NEON_, PALM_ regexes: replace \bFLY_ / \bNEON_ / \bPALM_ with
  (^|_)FLY_ etc. In RE2, _ is a word character so \b does not fire between _
  and a letter, meaning MY_FLY_TOKEN, MY_NEON_KEY, MY_PALM_KEY were silently
  missed. Tests updated to assert both the positive and negative cases.

- Add name-regex patterns for XAI, ASSEMBLYAI, AI21, NVIDIA_NIM (reviewer
  suggestion). Tests added in TestAPIKeyScanner_NameRegex_NewAIProviders.
@Pringled Pringled changed the title feat: Improve API key scanner feat: Improve API key scanner, add more providers Mar 7, 2026
Pringled added 2 commits March 7, 2026 11:32
These are public identifiers, not secrets. AZURE_CLIENT_SECRET remains.
Reporting non-secret IDs at HIGH severity produces false positives for
any user running az login interactively.
CIRCLE_TOKEN is the only real CircleCI credential var and is already
covered by the exact-match in HighRiskEnvKeys. The CIRCLECI regex
pattern added no meaningful coverage.
@Pringled Pringled merged commit 15d18dd into main Mar 7, 2026
2 checks passed
@Pringled Pringled deleted the improve-api-keys branch March 7, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant