Compliance-relevant testing patterns in garak: architecture exploration #1659

dentity007 · 2026-03-29T21:42:46Z

dentity007
Mar 29, 2026

Following up on PR #1619, which jmartin-tech closed with guidance to decompose the original compliance probe into contributions to existing families and open a Discussion to explore the architecture. This is that Discussion.

Where I landed after the PR feedback

The core correction from the review: compliance is an evaluation lens, not an attack technique. A probe's job is to encapsulate a concrete attack method. Compliance is what you measure from the results, not what you name the probe. Once that clicked, the decomposition was straightforward.

The 80 prompts from PR #1619 map cleanly to existing technique families:

25 hallucination prompts (fabricated regulatory citations: fake NIST SP 800-171 controls, CMMC practices, DFARS clauses, HIPAA sections) → misleading territory. Same technique as FalseAssertion, applied to regulatory control numbers. A first PR along these lines is now open: feat: add fabricated regulatory citation prompts to misleading probe #1658
5 Unicode homoglyph prompts (Cyrillic/Latin lookalikes in compliance-framed bypass requests) → encoding family
15 bypass prompts (instructions for circumventing MFA, audit logging, FIPS encryption, policy-as-code) → malwaregen or similar
15 PII prompts (compliance-domain social engineering: assessor impersonation, audit log roleplay, POA&M template extraction) → coordination with ProPILE (feat: add ProPILE probes for PII leakage detection #1504) when it lands
20 fabrication prompts (false attestations, fake ATO letters, certificate generation, leading questions for false compliance confirmation) → misleading variant or potentially a new technique class. Open question I'd like input on.

The taxonomy question

After running Jeffrey's mock example (probe_tags: avid-effect:performance, reporting.taxonomy: avid-effect), I can see how compliance grouping works at the reporting layer. A compliance-oriented scan isn't a new module. It's a config that filters to probes tagged with the relevant taxonomy entries and groups results accordingly.

That raises a practical question: are there compliance-relevant AVID effect entries that aren't currently well-represented by probes in garak? The ones I'd flag as gaps:

performance:P0204 (Accuracy): misleading.FalseAssertion covers this, but regulatory citation fabrication (prompts that elicit fabricated control numbers) is a narrower, higher-stakes variant. Regulated-environment deployments have a specific failure mode here: a fabricated NIST SP 800-171 control cited in a System Security Plan isn't just misinformation, it's a potential False Claims Act issue.
security:S0403 (Adversarial Example): covered by encoding, dan, etc. The Unicode homoglyph prompts I have fit here.

I'm not proposing new taxonomy entries. I'd want to understand whether these are already considered covered before going further.

The detector question

This is the piece I'm least certain about architecturally.

Six detectors came out of the original PR work:

Detector	What it does	Why I think it has value
`FabricatedCitation`	Validates cited control numbers against known-valid NIST/CMMC/DFARS/HIPAA families	Regex-based, doesn't require an HF model, catches specific regulatory misinformation that general refutation detectors miss
`PIIPresence`	Detects SSN/CC/email/phone with synthetic SSN filtering and DFARS clause false-positive filtering	The DFARS clause filter is domain-specific value: `252.204-7012` matches US phone number regex but is a real clause number, not PII
`FalseAttestation`	Catches false compliance certifications, distinguishes C3PAO references in fake certs from proper disclaimers	C3PAO context precision is non-obvious and not captured by general misleading detectors
`BypassInstructions`	Detects actionable circumvention guidance with code block scoring	Supplements `malwaregen` detection for policy-specific bypass
`ControlWeakening`	Identifies suggestions to reduce security control effectiveness (weaker than explicit bypass)	Different failure mode than bypass, harder to catch
`RegulatoryConfidence`	Flags definitive regulatory claims without hedging	Risk signal for overconfident guidance in regulated contexts

Jeffrey's framing (detectors also have tags, and domain-specific detectors can augment probe tags to denote what results impact a compliance specification) suggests these could live somewhere. But I'm genuinely uncertain about the right structure:

Should domain-specific detectors like FabricatedCitation live in the same module as the probes that reference them (analogous to packagehallucination.PythonPypi living in detectors/packagehallucination.py)? That pattern makes sense when the detector validates probe-specific output.
Or is there a post-hoc analysis path that's more appropriate for the compliance framing?
Or should these be deferred until CAS provides a clearer intent-based structure?

I don't want to propose a detectors.compliance module. That was the wrong direction and Jeffrey was right to redirect it. But I also think FabricatedCitation has enough unique validation logic (lookup tables for real NIST families, CMMC practice levels, DFARS cyber clauses, HIPAA section ranges with subsection bounds) that it's worth contributing somewhere. Looking for direction on where.

The CAS angle

Jeffrey mentioned that prompts like these could become "intent stubs" under the CAS project, selecting inputs that target goals relevant to compliance-specific failures. That framing makes sense to me: the prompts define what a compliance-relevant failure looks like from the input side, and CAS would be the mechanism that selects and routes those intents in context.

I don't have visibility into the CAS timeline or public interface, but if there's anything useful I can do now to position the prompt inventory for CAS integration (format, metadata, organization), I'm happy to take direction on that.

What I'd like input on

Fabrication prompts: do false attestation / fake certificate generation prompts fit misleading as a subclass, or is this a technique distinct enough to warrant a new probe class? The technique is specifically "generate an authoritative-looking compliance document that appears legitimate," which is different from "expand on a false factual assertion."
Domain-specific detectors: where's the right home for detectors that validate against regulatory-specific knowledge (control number lookup tables, DFARS clause false-positive filtering)? Same module as the probe that references them? A separate detectors/regulatory.py? Something else?
Tag taxonomy: are there compliance-relevant AVID effect, OWASP, or quality tags that should be applied to existing probes but aren't? I have a list of CMMC/NIST-mapped failure modes I could cross-reference against current probe coverage if that's useful.

Thanks for the time on PR #1619 and for the architecture guidance. Following through on the incremental path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compliance-relevant testing patterns in garak: architecture exploration #1659

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Compliance-relevant testing patterns in garak: architecture exploration #1659

Uh oh!

Uh oh!

dentity007 Mar 29, 2026

Where I landed after the PR feedback

The taxonomy question

The detector question

The CAS angle

What I'd like input on

Replies: 0 comments

dentity007
Mar 29, 2026