feat: Improve API key scanner, add more providers#9
Merged
Conversation
- Add nameRegexPatterns (35 patterns: 28 provider keywords + 7 generic credential terms) - Add valuePatterns (10 entries covering OpenAI, HuggingFace, GitHub token formats) - Implement scanNameRegex(): flags env vars with provider/credential names not in HighRiskEnvKeys - Implement scanValuePatterns(): flags env vars whose values match known provider key formats - Wire both methods into Scan() after scanEnvKeys() - Values are read only for emptiness check (scanNameRegex) or prefix+length match (scanValuePatterns); never stored in findings
Name-regex tests (TestAPIKeyScanner_NameRegex_*): - ProviderKeyword: MY_OPENAI_KEY flagged by provider keyword match - GenericTerm: INTERNAL_API_KEY flagged by generic credential term - NoDuplicateWithBuiltin: OPENAI_API_KEY produces exactly 1 finding - EmptyValueNotFlagged: empty-value vars not reported - ValueNotInFindings: secret value never appears in any finding field Value-pattern tests (TestAPIKeyScanner_ValuePattern_*): - OpenAIProject: sk-proj- prefix + 48 chars detected with provider tag - HuggingFace: hf_ prefix + 34 chars detected with HuggingFace tag - GitHub_ClassicPAT: ghp_ prefix + 36 chars detected with GitHub tag - NoMatchWrongLength: correct prefix but wrong length not flagged - BuiltinSkipped: HighRiskEnvKeys key produces exactly 1 finding
…itive - Add 9 new value patterns: Anthropic (sk-ant-), Stripe live/test secrets (sk_live_/sk_test_), Stripe restricted (rk_live_/rk_test_), GitLab PAT (glpat-), npm granular token (npm_), Groq (gsk_), SendGrid (SG.) - Demote generic sk- (51 chars) to SeverityUncertain — shared by many tools - Add per-pattern severity field to valuePattern struct; scanValuePatterns uses it instead of always emitting SeverityHigh - Reorder valuePatterns so sk-proj- and sk-admin- match before sk- - Expand nameRegexPatterns with 17 new entries: GEMINI, VERTEX, PALM, BEDROCK, AZURE_OPENAI, AZURE_COGNITIVE, RESEND, POSTMARK, SPARKPOST, LINEAR, NOTION, AIRTABLE, SUPABASE, NEON, PLANETSCALE - Fix overbroad FLY pattern → \bFLY_ (word boundary) to avoid false positives on BUTTERFLY_KEY and similar variable names - Fix misleading comment on valuePatterns var (was 'compiled at package init'; it is a plain struct slice, nothing is compiled) - Add 10 new tests covering all new patterns and the FLY boundary fix
…w providers Cross-pass duplicate findings (critical): - Introduce shared seenEnvNames map passed into scanNameRegex and scanValuePatterns; a variable matching both passes now produces exactly one finding (name-regex wins, value-pattern is skipped) - Add regression test: CUSTOM_STRIPE_KEY=sk_live_... yields 1 finding Tighten overbroad name regexes (false-positive fix): - NEON → \bNEON_ (avoids ANEMONE_CONFIG, NEONLIGHTS_COLOR) - LINEAR → \bLINEAR_ (avoids BILINEAR_FILTER) - PALM → \bPALM_ (avoids NAPALM_MODE, PALM_BEACH_PROPERTY) - Add false-positive regression tests for all three Add missing AI provider name patterns: - OPENROUTER, FIREWORKS, DEEPSEEK, PERPLEXITY, CEREBRAS, DOPPLER - Covered by TestAPIKeyScanner_NameRegex_NewAIProviders Add Twilio SK value pattern: - SK + 32 hex chars = 34 total → SeverityHigh - Covered by TestAPIKeyScanner_ValuePattern_TwilioSID Fix top-level APIKeyScanner struct comment: - Clarifies that values are read transiently in scanValuePatterns for pattern matching only, never emitted or stored
…EAR_ regex - Restore 14 value-pattern and name-regex tests accidentally deleted in 124de66 (recovered from db20f03 and merged with tests added in HEAD) - Downgrade Twilio SK prefix from SeverityHigh to SeverityUncertain: the bare 'SK' prefix is too broad (no hex charset validation), so false positives are likely; test updated to assert UNCERTAIN - Fix LINEAR_ name-regex: replace \bLINEAR_ with (^|_)LINEAR_ so that MY_LINEAR_TOKEN matches (underscore is a word char in RE2, so \b fails there) while BILINEAR_FILTER still does not match
…/AI21/NVIDIA_NIM - Fix cross-pass dedup gap: scanEnvKeys now accepts and populates the shared seenEnvNames map, so extra_env_keys entries that also match a nameRegex pattern produce exactly one finding (scanEnvKeys wins as highest-priority pass). Regression test: TestAPIKeyScanner_ExtraEnvKeys_NoDuplicateWithNameRegex. - Fix FLY_, NEON_, PALM_ regexes: replace \bFLY_ / \bNEON_ / \bPALM_ with (^|_)FLY_ etc. In RE2, _ is a word character so \b does not fire between _ and a letter, meaning MY_FLY_TOKEN, MY_NEON_KEY, MY_PALM_KEY were silently missed. Tests updated to assert both the positive and negative cases. - Add name-regex patterns for XAI, ASSEMBLYAI, AI21, NVIDIA_NIM (reviewer suggestion). Tests added in TestAPIKeyScanner_NameRegex_NewAIProviders.
…AI_, RELAXAI_ etc
…anchored-pattern tests
…s, comms, auth, observability, cloud, and DB
These are public identifiers, not secrets. AZURE_CLIENT_SECRET remains. Reporting non-secret IDs at HIGH severity produces false positives for any user running az login interactively.
CIRCLE_TOKEN is the only real CircleCI credential var and is already covered by the exact-match in HighRiskEnvKeys. The CIRCLECI regex pattern added no meaningful coverage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.