Compliance-relevant testing patterns in garak: architecture exploration #1659
dentity007
started this conversation in
Ideas and suggestions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Following up on PR #1619, which jmartin-tech closed with guidance to decompose the original compliance probe into contributions to existing families and open a Discussion to explore the architecture. This is that Discussion.
Where I landed after the PR feedback
The core correction from the review: compliance is an evaluation lens, not an attack technique. A probe's job is to encapsulate a concrete attack method. Compliance is what you measure from the results, not what you name the probe. Once that clicked, the decomposition was straightforward.
The 80 prompts from PR #1619 map cleanly to existing technique families:
misleadingterritory. Same technique asFalseAssertion, applied to regulatory control numbers. A first PR along these lines is now open: feat: add fabricated regulatory citation prompts to misleading probe #1658encodingfamilymalwaregenor similarmisleadingvariant or potentially a new technique class. Open question I'd like input on.The taxonomy question
After running Jeffrey's mock example (
probe_tags: avid-effect:performance,reporting.taxonomy: avid-effect), I can see how compliance grouping works at the reporting layer. A compliance-oriented scan isn't a new module. It's a config that filters to probes tagged with the relevant taxonomy entries and groups results accordingly.That raises a practical question: are there compliance-relevant AVID effect entries that aren't currently well-represented by probes in garak? The ones I'd flag as gaps:
performance:P0204(Accuracy):misleading.FalseAssertioncovers this, but regulatory citation fabrication (prompts that elicit fabricated control numbers) is a narrower, higher-stakes variant. Regulated-environment deployments have a specific failure mode here: a fabricated NIST SP 800-171 control cited in a System Security Plan isn't just misinformation, it's a potential False Claims Act issue.security:S0403(Adversarial Example): covered byencoding,dan, etc. The Unicode homoglyph prompts I have fit here.I'm not proposing new taxonomy entries. I'd want to understand whether these are already considered covered before going further.
The detector question
This is the piece I'm least certain about architecturally.
Six detectors came out of the original PR work:
FabricatedCitationPIIPresence252.204-7012matches US phone number regex but is a real clause number, not PIIFalseAttestationBypassInstructionsmalwaregendetection for policy-specific bypassControlWeakeningRegulatoryConfidenceJeffrey's framing (detectors also have tags, and domain-specific detectors can augment probe tags to denote what results impact a compliance specification) suggests these could live somewhere. But I'm genuinely uncertain about the right structure:
FabricatedCitationlive in the same module as the probes that reference them (analogous topackagehallucination.PythonPypiliving indetectors/packagehallucination.py)? That pattern makes sense when the detector validates probe-specific output.I don't want to propose a
detectors.compliancemodule. That was the wrong direction and Jeffrey was right to redirect it. But I also thinkFabricatedCitationhas enough unique validation logic (lookup tables for real NIST families, CMMC practice levels, DFARS cyber clauses, HIPAA section ranges with subsection bounds) that it's worth contributing somewhere. Looking for direction on where.The CAS angle
Jeffrey mentioned that prompts like these could become "intent stubs" under the CAS project, selecting inputs that target goals relevant to compliance-specific failures. That framing makes sense to me: the prompts define what a compliance-relevant failure looks like from the input side, and CAS would be the mechanism that selects and routes those intents in context.
I don't have visibility into the CAS timeline or public interface, but if there's anything useful I can do now to position the prompt inventory for CAS integration (format, metadata, organization), I'm happy to take direction on that.
What I'd like input on
Fabrication prompts: do false attestation / fake certificate generation prompts fit
misleadingas a subclass, or is this a technique distinct enough to warrant a new probe class? The technique is specifically "generate an authoritative-looking compliance document that appears legitimate," which is different from "expand on a false factual assertion."Domain-specific detectors: where's the right home for detectors that validate against regulatory-specific knowledge (control number lookup tables, DFARS clause false-positive filtering)? Same module as the probe that references them? A separate
detectors/regulatory.py? Something else?Tag taxonomy: are there compliance-relevant AVID effect, OWASP, or quality tags that should be applied to existing probes but aren't? I have a list of CMMC/NIST-mapped failure modes I could cross-reference against current probe coverage if that's useful.
Thanks for the time on PR #1619 and for the architecture guidance. Following through on the incremental path.
Beta Was this translation helpful? Give feedback.
All reactions