Falko Sieverding activity

Falko Sieverding opened issue #1: Source-verified validation findings: 2 bugs, 1 deprecation, 3 accuracy fixes at gl-demo-ultimate-ryappleby / Duo Agent Plat...

2026-03-17T11:33:04Z

Context

I integrated your dap-agent skill into my Claude Code harness and ran a deep validation against GitLab source code (gitlab-org/gitlab master + ai-assist flow registry v1.md). 4 parallel research agents checked tool names, GraphQL queries, YAML schema claims, and script correctness.

Overall: excellent work. 48/48 tool names confirmed, 4/4 gotcha claims accurate, YAML schema mostly correct. Found 2 bugs and 1 deprecation worth fixing.

Bug 1: `enable_flow.py` — wrong GraphQL argument name (CRITICAL)

File: dap-agent/scripts/enable_flow.py

The GROUP_ENABLE and PROJECT_ENABLE mutations use catalogItemId as the argument name:

mutation($catalogItemId: AiCatalogItemID!, ...) {
  aiCatalogItemConsumerCreate(input: {
    catalogItemId: $catalogItemId

Source says: The actual argument is itemId, not catalogItemId.

Verified in ee/app/graphql/mutations/ai/catalog/item_consumer/create.rb:

argument :item_id, ::Types::GlobalIDType[::Ai::Catalog::Item]

Fix: Rename catalogItemId → itemId in both mutations and the Python variable dict.

Bug 2: `enable_flow.py` — wrong `triggerTypes` variable type

File: dap-agent/scripts/enable_flow.py

The PROJECT_ENABLE mutation declares:

$triggerTypes: [AiCatalogItemConsumerTriggerTypeEnum!]

Source says: The actual argument type is [GraphQL::Types::String] (plain strings), not an enum.

Fix: Change to $triggerTypes: [String!]

Deprecation Warning: `extract_reasoning.py` uses deprecated `checkpoint` field

File: dap-agent/scripts/extract_reasoning.py

The script queries duoWorkflowEvents for the checkpoint field. This field is deprecated since milestone 18.7.

From ee/app/graphql/types/ai/duo_workflows/workflow_event_type.rb:

field :checkpoint, ... deprecated: { reason: 'Checkpoints are big & contain internal langgraph details', milestone: '18.7' }

Still functional today but at risk of removal. The duoMessages field (which diagnose_workflow.py uses) is the intended replacement for most diagnostic use cases, though it lacks the full conversation_history with AIMessage reasoning that checkpoint provides.

Suggestion: Add a deprecation note in the script docstring and consider whether the duoMessages approach from diagnose_workflow.py could be extended to cover the reasoning extraction use case.

Accuracy Fixes for SKILL.md / references

1. Context variables — "ONLY `context:goal` and `context:project_id`" is too restrictive

The SKILL.md and references/flow-structure.md state these are the only flow-level context variables. Verified against ai-assist built-in flows (developer.yml, resolve_sast_vulnerability.yml, code_review.yml), additional runtime context vars exist:

context:workflow_id
context:project_http_url_to_repo
context:project_default_branch
context:current_date
context:session_url
context:inputs.* (via flow.inputs schema declaration)

The warning that context:issue_iid doesn't exist is correct and important — zero hits in ai-assist source. But the "ONLY" claim should be softened.

2. Missing `HumanInputComponent` (experimental)

references/flow-structure.md documents three component types. The ai-assist experimental spec (docs/flow_registry/experimental.md) adds HumanInputComponent for human-in-the-loop approval/rejection. It only works in ide environment (not ambient), so it can't be used in issue/MR-triggered flows. Worth mentioning with the experimental caveat.

3. `validate_flow.py` — incomplete tool set

The VALID_TOOLS set in the validation script is missing ~40 tools that exist in ee/lib/ai/catalog/built_in_tool_definitions.rb:

Vulnerability tools (12): get_vulnerability_details, list_vulnerabilities, confirm_vulnerability, dismiss_vulnerability, etc.
Epic tools (6): create_epic, get_epic, list_epics, etc.
Work Item tools (6): create_work_item, get_work_item, etc.
Task/Plan tools (6): create_plan, add_new_task, set_task_status, etc.
Other: get_current_user, get_previous_session_context, get_commit_comments, get_wiki_page, post_duo_code_review, build_review_merge_request_context, extract_lines_from_text, audit event tools

These tools are valid and would be flagged as errors by the current validator.

Validation Method

Tool names: Verified all 48 against ee/lib/ai/catalog/built_in_tool_definitions.rb (numeric IDs confirmed)
GraphQL: Verified 5 scripts against ee/app/graphql/ type definitions, resolvers, and mutations
YAML schema: Verified 7 claims against ai-assist/docs/flow_registry/v1.md + Pydantic models in flow_config.py
Tool gotchas: All 4 confirmed accurate (list_issue_notes vs list_all_merge_request_notes, run_command no git, gitlab__user_search double underscore, no GITLAB_TOKEN in shell)

Thanks for building this — the skill structure and the playbook-system architecture are genuinely useful for DAP flow development.

Falko Sieverding pushed to project branch main at universalamateur / Fabric

2026-03-16T17:34:00Z

Falko Sieverding (16b20633) at 16 Mar 17:34

chore(release): Update version to v1.4.437

... and 2 more commits

Falko Sieverding pushed new project tag v1.4.437 at universalamateur / Fabric

2026-03-16T17:33:59Z

Falko Sieverding (16b20633) at 16 Mar 17:33

chore(release): Update version to v1.4.437

Falko Sieverding opened issue #593728: Prompt lifecycle governance: version pinning, approval gates, and compliance alignment for enterprise DAP at GitLab.or...

2026-03-16T14:29:32Z

Problem

Regulated enterprises (banking, insurance, government) cannot adopt DAP foundational flows at scale without prompt lifecycle governance. Today, GitLab controls the full prompt lifecycle with no customer-facing control plane.

The current architecture explicitly decouples prompts from GitLab releases (&13024). This optimizes for iteration speed, which is the right default. But it means AI behavior on a customer's instance can change without any version change the customer can track, approve, or roll back.

For a bank under FCA/PRA/DORA, an insurance company under Solvency II, or a government agency under the EU AI Act, this is significant adoption friction that multiple regulated prospects have flagged independently.

What customers are asking for

Pattern across accounts:

UXR "From Switching to Sticking" study: "Trust-Through-Transparency" identified as key adoption differentiator. Users asking about model differences per agent.
Multiple Dedicated customers (UK banking, EMEA enterprise): security teams require sign-off on changes to AI behavior before they reach production. They already have pre-prod → prod upgrade cadences for GitLab version upgrades. Prompt changes bypass this entirely.
Field engagement reports (Mar 2026): customers asking about prompt tracking, told "not in roadmap"
Additional customer (Feb 2026): trust erosion from lack of transparency around AI feature readiness

What exists today

Capability	Foundational flows	Custom agents/flows
Version pinning	No (always latest)	Yes (AI Catalog)
Customer prompt override	No	Yes (custom system prompt)
Approval gate before change	No	N/A (customer controls)
Rollback to previous version	No	Yes (pin older version)

The gap is entirely on the foundational flows side.

Proposal

This issue scopes the full problem. Implementation should be phased and will likely span multiple milestones.

Phase 1: Foundational flow version pinning

One possible approach: extend AI Catalog version pinning to foundational flows. Allow instance admins to:

View available prompt versions for each foundational flow
Pin all foundational flows to a specific AI Gateway prompt version
Manually upgrade to a newer version when ready (explicit opt-in)

This mirrors how Dedicated customers already control GitLab version upgrades. One possible approach: the Rails monolith reads the admin's pinned version from the database and sends it as the version constraint to the AI Gateway (instead of the default ^1.0.0). The shared AIGW fleet resolves accordingly, requiring no per-tenant AIGW deployment.

Consideration: pinned versions interact with model upgrades (LEGACY_MODEL_MAPPING couples prompt versions to models) and Jinja2 {% include %} templates. A compatibility and support policy would be needed for how long old prompt versions remain supported after new ones ship.

Phase 2: Pre-prod prompt staging for Dedicated

Dedicated customers with pre-prod and prod instances already test GitLab version upgrades in pre-prod first. Extend this to prompt changes:

Pin prod to the current prompt version (Phase 1 prerequisite)
Pre-prod gets the new version first
Customer validates AI behavior
Customer updates the pin on prod when ready

Note: Dedicated instances share the same AI Gateway fleet (no per-tenant AIGW deployment). Phase 2 depends on Phase 1's version pinning mechanism, where Rails sends tenant-specific version constraints. No AIGW infrastructure change is required.

Phase 3: Compliance framework alignment

Map DAP prompt governance to regulatory frameworks:

Framework	Requirement	DAP coverage needed
EU AI Act Art. 13	Transparency obligations for high-risk AI systems — enables customers to meet these when using DAP in regulated contexts	Prompt changelog, version visibility
EU AI Act Art. 14	Human oversight for high-risk AI — enables customers to understand, monitor, intervene	Version pinning, approval gates
FCA/PRA SMCR	Senior manager personal accountability — managers need governance tooling to discharge AI oversight obligations	Audit trail linking output to prompt version
DORA Art. 6	ICT risk management: change management for digital services	Prompt change approval workflow
SOC2 CC8.1	Change management controls	Documented prompt change process
NIST AI RMF	GOVERN 1.2: accountability structures for AI lifecycle	End-to-end prompt governance

Phase 4: Enterprise governance integration

As DAP governance capabilities mature (&18948), prompt lifecycle governance is a natural extension point:

Prompt change approval workflows (multi-party sign-off)
Prompt diff notifications with configurable policy (block, warn, notify)
Prompt compliance reports for auditors
Integration with external GRC tools via API

The commercial packaging (included vs. add-on) is a product decision. This issue scopes the technical capability.

Competitive context

GitHub Copilot: claims "comprehensive governance framework" with prompt content in audit logs and explicit approval workflows. Details thin, but the messaging is ahead.
Neither platform fully solves runtime auditability today — linking a specific output to the exact prompt+model that produced it. First mover here wins regulated enterprise trust.

#498204 — Define strategy on versioning prompts for AI Gateway (CLOSED)
#476178 — Reducing SM Upgrades for AI Features (CLOSED, decoupled prompts)
&13024 — AI Gateway as Sole Access Point
&20278 — O1: Establish Governance Guardrails
&20279 — O3: Make AI Usage Transparent and Explainable
&20263 — O2: Enable Secure DAP Deployment at Scale
&18948 — AI Governance umbrella epic
&20231 — All DAP Settings should be covered by Audit logging
ai-assist#1899 — Better testing harness for cross-flow prompt dependencies

Falko Sieverding opened issue #593727: Include prompt version and model version in DAP audit events and session metadata at GitLab.org / GitLab

2026-03-16T14:29:31Z

Problem

When a DAP flow executes, the resulting audit event and session metadata do not include which prompt version or model was used. After the fact, there is no way to prove which exact instructions the AI received for a specific output.

The current G1-G17 audit logging work (&20231) covers DAP settings changes (enable/disable flows, service accounts, namespace settings). It does not cover the prompt content that governed the AI's behavior during execution.

For regulated customers under FCA/PRA/DORA, and those deploying AI in high-risk contexts under the EU AI Act, this is a compliance gap. SMCR-accountable managers at banks need this tooling to discharge their AI oversight obligations. Auditors need to trace: user action → prompt version → model → output.

Proposal

1. Prompt version in audit events

When a DAP flow step executes, include in the audit event:

prompt_id: e.g., sast_fp_detection_agent_prompt
prompt_version: e.g., 1.0.0 (the resolved version, not the constraint)
model_id: e.g., claude_sonnet_4_20250514 (note: foundational flows use multiple models depending on the action, not a single model)
model_family_resolved: e.g., base or amazon_q (the prompt family variant that was selected via fallback logic)

This requires a change to the AI Gateway response contract. Currently, the AIGW /v1/prompts/{prompt_id} endpoint returns raw model output (string or stream) with no metadata. The resolved prompt version and model ID would need to be added to response headers or a metadata envelope. Rails would then include them in the audit event payload.

2. Prompt version in session metadata

The existing DAP session view (Agent Insights) shows flow steps and their outputs. Add the prompt version and model used for each step. This gives admins a per-session view of exactly which instructions the AI followed.

3. Prompt content hash in streaming audit logs

For customers streaming audit events to a SIEM (Splunk, Elastic, etc.), include an HMAC-SHA256 hash (keyed per instance) of the rendered prompt content at LLM call time. This allows compliance teams to verify that the prompt content matches a known-good version without exposing the full prompt text in the log stream.

Note: hashing the template source file is insufficient because Jinja2 templates use {% include %}, conditionals, and variable interpolation. The same template can produce different rendered prompts depending on inputs. The hash must capture the final rendered prompt sent to the model. The existing PromptLoggingHandler (structlog) already logs the rendered prompt, so this data is available in the logging pipeline.

Implementation complexity

The prompt version is resolved at runtime within the AI Gateway's prompt registry, and the model is already selected. However, the resolved version string is currently discarded after the config object is returned. It is not persisted or included in the AIGW response.

The main work is:

AI Gateway side: Modify the prompt registry to retain the resolved version string, and extend the response contract to return it (via headers or metadata envelope). For streaming responses, metadata would go in headers or a final metadata chunk.
Rails side: Consume the returned metadata and include it in audit event payloads and session metadata.

This is incremental work that does not require architectural changes, but it does require coordination between the AI Gateway and Rails audit event teams.

&20231 — All DAP Settings should be covered by Audit logging (G1-G17)
#593015 — G10: Audit logging for Ai::Setting instance config changes
#593019 — G4: Audit logging for enabled_foundational_flows array
&20279 — O3: Make AI Usage Transparent and Explainable

Falko Sieverding opened issue #593726: Prompt change notifications and changelog for DAP foundational flows at GitLab.org / GitLab

2026-03-16T14:29:29Z

Problem

When GitLab updates system or functional prompts in foundational DAP flows (Code Review, SAST Vulnerability Resolution, Fix Pipeline, Duo Developer, etc.), Dedicated and Self-Managed customers have no way to see what changed.

Prompts are YAML+Jinja2 files in ai-assist/ai_gateway/prompts/definitions/ with semantic versioning. The AI Gateway deploys independently from GitLab Rails releases. Foundational flows always resolve to the latest compatible stable version. There is no customer-facing changelog, no notification, no diff.

This is significant adoption friction for regulated industries. FCA/PRA/DORA require traceability of AI behavior changes. The EU AI Act (full enforcement August 2026) requires transparency obligations for AI systems used in high-risk contexts. Customers deploying DAP in regulated environments need to know when the AI's instructions change to meet their own compliance obligations.

Evidence of prompt changes causing breakage without customer visibility

INC-7360 (Feb 11, 2026): shared prompt template change introduced a required variable, broke Fix Pipeline + Duo Developer flows
MR !218609: wrong prompt version shipped for DAP Code Review, required emergency backport to 18.8
Jan 13, 2026: Vertex model upgrade deployed without matching prompt file, broke /vulnerability_explain

Customer signals

UXR "From Switching to Sticking" study: users asking "What is the AI that DAP is using pre-tested to do? Any differences in model used for each agent?"
Multiple Dedicated customers (UK banking, EMEA enterprise): security teams raised prompt governance in security reviews. Governance teams need to sign off on changes to AI behavior before they reach production.
Field engagement reports (Mar 2026): customers asking about prompt tracking, told "not in the roadmap"
Additional customer (Feb 2026): "lost a lot of trust... lack of transparency around the readiness of these features"

Proposal

1. Prompt changelog per AI Gateway release

Auto-generate a changelog of prompt definition changes (new versions, modified prompts, deprecated versions) as part of the AI Gateway release process. Publish alongside existing release notes.

The ai-assist repo already has semantic versioning and a CI pipeline. A job that diffs ai_gateway/prompts/definitions/ between tags and outputs a structured changelog is low-to-medium effort. Note: the changelog should cover version transitions and affected flow names, not raw prompt diffs (to avoid exposing prompt injection defenses). It should also track LEGACY_MODEL_MAPPING changes in registry.py (model-to-prompt bindings that change AI behavior without prompt file changes) and inline flow prompts stored via InMemoryPromptRegistry.

2. Admin UI: active prompt versions

Add a section in Admin > Duo Agent Platform (or Settings > AI) showing the currently active prompt version for each foundational flow. Example:

Flow	Prompt ID	Active Version	Changed In
Code Review	`review_merge_request`	1.3.0	AI GW 18.9.2
SAST FP Detection	`sast_fp_detection_agent_prompt`	1.0.0	AI GW 18.8.0
Fix Pipeline	`create_repository_branch`	1.1.0	AI GW 18.9.1

3. Subscription mechanism

Allow admins to subscribe to prompt change notifications, similar to release notifications or security release emails. When a prompt version changes on their instance, subscribed admins receive an email or webhook with the prompt ID, old version, new version, and a link to the changelog.

Prior art

GitLab already publishes security release notifications via email subscription
GitLab releases have RSS feeds
The AI Transparency Center covers model selection and data handling but does not address prompt content or prompt change visibility
The ai-assist repo is public, so prompt diffs are technically available, but monitoring a repo is not a governance solution

Implementation notes

Prompt templates use Jinja2 with {% include %} cross-references. A single prompt's effective content can span multiple files. The changelog should track all files contributing to a prompt definition, not just the top-level YAML.
Security-sensitive prompt components (gated by CODEOWNERS with AppSec approval) should be excluded from customer-facing changelogs to avoid exposing prompt injection defenses.

#498204 — Define strategy on versioning prompts for AI Gateway (CLOSED, implemented internal versioning)
&13024 — AI Gateway as Sole Access Point (decouples prompts from releases)
&20279 — O3: Make AI Usage Transparent and Explainable

Falko Sieverding activity

Falko Sieverding opened issue #1: Source-verified validation findings: 2 bugs, 1 deprecation, 3 accuracy fixes at gl-demo-ultimate-ryappleby / Duo Agent Plat...

Context

Bug 1: enable_flow.py — wrong GraphQL argument name (CRITICAL)

Bug 2: enable_flow.py — wrong triggerTypes variable type

Deprecation Warning: extract_reasoning.py uses deprecated checkpoint field

Accuracy Fixes for SKILL.md / references

1. Context variables — "ONLY context:goal and context:project_id" is too restrictive

2. Missing HumanInputComponent (experimental)

3. validate_flow.py — incomplete tool set

Validation Method

Falko Sieverding pushed to project branch main at universalamateur / Fabric

Falko Sieverding pushed new project tag v1.4.437 at universalamateur / Fabric

Falko Sieverding opened issue #593728: Prompt lifecycle governance: version pinning, approval gates, and compliance alignment for enterprise DAP at GitLab.or...

Problem

What customers are asking for

What exists today

Proposal

Phase 1: Foundational flow version pinning

Phase 2: Pre-prod prompt staging for Dedicated

Phase 3: Compliance framework alignment

Phase 4: Enterprise governance integration

Competitive context

Related

Falko Sieverding opened issue #593727: Include prompt version and model version in DAP audit events and session metadata at GitLab.org / GitLab

Problem

Proposal

1. Prompt version in audit events

2. Prompt version in session metadata

3. Prompt content hash in streaming audit logs

Implementation complexity

Related

Falko Sieverding opened issue #593726: Prompt change notifications and changelog for DAP foundational flows at GitLab.org / GitLab

Problem

Evidence of prompt changes causing breakage without customer visibility

Customer signals

Proposal

1. Prompt changelog per AI Gateway release

2. Admin UI: active prompt versions

3. Subscription mechanism

Prior art

Implementation notes

Related

Bug 1: `enable_flow.py` — wrong GraphQL argument name (CRITICAL)

Bug 2: `enable_flow.py` — wrong `triggerTypes` variable type

Deprecation Warning: `extract_reasoning.py` uses deprecated `checkpoint` field

1. Context variables — "ONLY `context:goal` and `context:project_id`" is too restrictive

2. Missing `HumanInputComponent` (experimental)

3. `validate_flow.py` — incomplete tool set