feat: Improve API key scanner, add more providers by Pringled · Pull Request #9 · Pringled/agentcheck

Pringled · 2026-03-07T09:47:28Z

No description provided.

- Add nameRegexPatterns (35 patterns: 28 provider keywords + 7 generic credential terms) - Add valuePatterns (10 entries covering OpenAI, HuggingFace, GitHub token formats) - Implement scanNameRegex(): flags env vars with provider/credential names not in HighRiskEnvKeys - Implement scanValuePatterns(): flags env vars whose values match known provider key formats - Wire both methods into Scan() after scanEnvKeys() - Values are read only for emptiness check (scanNameRegex) or prefix+length match (scanValuePatterns); never stored in findings

Name-regex tests (TestAPIKeyScanner_NameRegex_*): - ProviderKeyword: MY_OPENAI_KEY flagged by provider keyword match - GenericTerm: INTERNAL_API_KEY flagged by generic credential term - NoDuplicateWithBuiltin: OPENAI_API_KEY produces exactly 1 finding - EmptyValueNotFlagged: empty-value vars not reported - ValueNotInFindings: secret value never appears in any finding field Value-pattern tests (TestAPIKeyScanner_ValuePattern_*): - OpenAIProject: sk-proj- prefix + 48 chars detected with provider tag - HuggingFace: hf_ prefix + 34 chars detected with HuggingFace tag - GitHub_ClassicPAT: ghp_ prefix + 36 chars detected with GitHub tag - NoMatchWrongLength: correct prefix but wrong length not flagged - BuiltinSkipped: HighRiskEnvKeys key produces exactly 1 finding

…itive - Add 9 new value patterns: Anthropic (sk-ant-), Stripe live/test secrets (sk_live_/sk_test_), Stripe restricted (rk_live_/rk_test_), GitLab PAT (glpat-), npm granular token (npm_), Groq (gsk_), SendGrid (SG.) - Demote generic sk- (51 chars) to SeverityUncertain — shared by many tools - Add per-pattern severity field to valuePattern struct; scanValuePatterns uses it instead of always emitting SeverityHigh - Reorder valuePatterns so sk-proj- and sk-admin- match before sk- - Expand nameRegexPatterns with 17 new entries: GEMINI, VERTEX, PALM, BEDROCK, AZURE_OPENAI, AZURE_COGNITIVE, RESEND, POSTMARK, SPARKPOST, LINEAR, NOTION, AIRTABLE, SUPABASE, NEON, PLANETSCALE - Fix overbroad FLY pattern → \bFLY_ (word boundary) to avoid false positives on BUTTERFLY_KEY and similar variable names - Fix misleading comment on valuePatterns var (was 'compiled at package init'; it is a plain struct slice, nothing is compiled) - Add 10 new tests covering all new patterns and the FLY boundary fix

…w providers Cross-pass duplicate findings (critical): - Introduce shared seenEnvNames map passed into scanNameRegex and scanValuePatterns; a variable matching both passes now produces exactly one finding (name-regex wins, value-pattern is skipped) - Add regression test: CUSTOM_STRIPE_KEY=sk_live_... yields 1 finding Tighten overbroad name regexes (false-positive fix): - NEON → \bNEON_ (avoids ANEMONE_CONFIG, NEONLIGHTS_COLOR) - LINEAR → \bLINEAR_ (avoids BILINEAR_FILTER) - PALM → \bPALM_ (avoids NAPALM_MODE, PALM_BEACH_PROPERTY) - Add false-positive regression tests for all three Add missing AI provider name patterns: - OPENROUTER, FIREWORKS, DEEPSEEK, PERPLEXITY, CEREBRAS, DOPPLER - Covered by TestAPIKeyScanner_NameRegex_NewAIProviders Add Twilio SK value pattern: - SK + 32 hex chars = 34 total → SeverityHigh - Covered by TestAPIKeyScanner_ValuePattern_TwilioSID Fix top-level APIKeyScanner struct comment: - Clarifies that values are read transiently in scanValuePatterns for pattern matching only, never emitted or stored

…EAR_ regex - Restore 14 value-pattern and name-regex tests accidentally deleted in 124de66 (recovered from db20f03 and merged with tests added in HEAD) - Downgrade Twilio SK prefix from SeverityHigh to SeverityUncertain: the bare 'SK' prefix is too broad (no hex charset validation), so false positives are likely; test updated to assert UNCERTAIN - Fix LINEAR_ name-regex: replace \bLINEAR_ with (^|_)LINEAR_ so that MY_LINEAR_TOKEN matches (underscore is a word char in RE2, so \b fails there) while BILINEAR_FILTER still does not match

…/AI21/NVIDIA_NIM - Fix cross-pass dedup gap: scanEnvKeys now accepts and populates the shared seenEnvNames map, so extra_env_keys entries that also match a nameRegex pattern produce exactly one finding (scanEnvKeys wins as highest-priority pass). Regression test: TestAPIKeyScanner_ExtraEnvKeys_NoDuplicateWithNameRegex. - Fix FLY_, NEON_, PALM_ regexes: replace \bFLY_ / \bNEON_ / \bPALM_ with (^|_)FLY_ etc. In RE2, _ is a word character so \b does not fire between _ and a letter, meaning MY_FLY_TOKEN, MY_NEON_KEY, MY_PALM_KEY were silently missed. Tests updated to assert both the positive and negative cases. - Add name-regex patterns for XAI, ASSEMBLYAI, AI21, NVIDIA_NIM (reviewer suggestion). Tests added in TestAPIKeyScanner_NameRegex_NewAIProviders.

…AI_, RELAXAI_ etc

…anchored-pattern tests

…n, trim comments

…s, comms, auth, observability, cloud, and DB

…l.go

These are public identifiers, not secrets. AZURE_CLIENT_SECRET remains. Reporting non-secret IDs at HIGH severity produces false positives for any user running az login interactively.

CIRCLE_TOKEN is the only real CircleCI credential var and is already covered by the exact-match in HighRiskEnvKeys. The CIRCLECI regex pattern added no meaningful coverage.

Pringled added 15 commits March 7, 2026 10:12

Update

2d7f91d

Simplify

f5f446d

Fix tests

602fbdd

Fixes

c7b9c00

fix: narrow XAI pattern to (^|_)XAI_ to avoid false positives on PROX…

095e3b9

…AI_, RELAXAI_ etc

refactor(tests): simplify apikeys_test — remove duplicates, collapse …

14a9589

…anchored-pattern tests

refactor: simplify apikeys — inline helpers, deduplicate env iteratio…

c7d8160

…n, trim comments

feat: expand provider coverage — 40+ new providers across AI, payment…

4f2c483

…s, comms, auth, observability, cloud, and DB

refactor: move HighRiskEnvKeys to apikeys.go, K8SProdPatterns to loca…

3376409

…l.go

Pringled changed the title ~~feat: Improve API key scanner~~ feat: Improve API key scanner, add more providers Mar 7, 2026

Pringled added 2 commits March 7, 2026 11:32

fix: remove AZURE_CLIENT_ID and AZURE_TENANT_ID from HighRiskEnvKeys

1e5ea56

These are public identifiers, not secrets. AZURE_CLIENT_SECRET remains. Reporting non-secret IDs at HIGH severity produces false positives for any user running az login interactively.

fix: drop CIRCLECI regex from providerNamePatterns

bcb29e6

CIRCLE_TOKEN is the only real CircleCI credential var and is already covered by the exact-match in HighRiskEnvKeys. The CIRCLECI regex pattern added no meaningful coverage.

Pringled merged commit 15d18dd into main Mar 7, 2026
2 checks passed

Pringled deleted the improve-api-keys branch March 7, 2026 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Improve API key scanner, add more providers#9

feat: Improve API key scanner, add more providers#9
Pringled merged 17 commits intomainfrom
improve-api-keys

Pringled commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pringled commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant