Falko Sieverding activity https://gitlab.com/fsieverding 2026-03-17T18:58:43Z tag:gitlab.com,2026-03-17:5212378103 Falko Sieverding opened issue #1: Source-verified validation findings: 2 bugs, 1 deprecation, 3 accuracy fixes at gl-demo-ultimate-ryappleby / Duo Agent Plat... 2026-03-17T11:33:04Z fsieverding Falko Sieverding [email protected]

Context

I integrated your dap-agent skill into my Claude Code harness and ran a deep validation against GitLab source code (gitlab-org/gitlab master + ai-assist flow registry v1.md). 4 parallel research agents checked tool names, GraphQL queries, YAML schema claims, and script correctness.

Overall: excellent work. 48/48 tool names confirmed, 4/4 gotcha claims accurate, YAML schema mostly correct. Found 2 bugs and 1 deprecation worth fixing.

Bug 1: enable_flow.py — wrong GraphQL argument name (CRITICAL)

File: dap-agent/scripts/enable_flow.py

The GROUP_ENABLE and PROJECT_ENABLE mutations use catalogItemId as the argument name:

mutation($catalogItemId: AiCatalogItemID!, ...) {
  aiCatalogItemConsumerCreate(input: {
    catalogItemId: $catalogItemId

Source says: The actual argument is itemId, not catalogItemId.

Verified in ee/app/graphql/mutations/ai/catalog/item_consumer/create.rb:

argument :item_id, ::Types::GlobalIDType[::Ai::Catalog::Item]

Fix: Rename catalogItemIditemId in both mutations and the Python variable dict.

Bug 2: enable_flow.py — wrong triggerTypes variable type

File: dap-agent/scripts/enable_flow.py

The PROJECT_ENABLE mutation declares:

$triggerTypes: [AiCatalogItemConsumerTriggerTypeEnum!]

Source says: The actual argument type is [GraphQL::Types::String] (plain strings), not an enum.

Fix: Change to $triggerTypes: [String!]

Deprecation Warning: extract_reasoning.py uses deprecated checkpoint field

File: dap-agent/scripts/extract_reasoning.py

The script queries duoWorkflowEvents for the checkpoint field. This field is deprecated since milestone 18.7.

From ee/app/graphql/types/ai/duo_workflows/workflow_event_type.rb:

field :checkpoint, ... deprecated: { reason: 'Checkpoints are big & contain internal langgraph details', milestone: '18.7' }

Still functional today but at risk of removal. The duoMessages field (which diagnose_workflow.py uses) is the intended replacement for most diagnostic use cases, though it lacks the full conversation_history with AIMessage reasoning that checkpoint provides.

Suggestion: Add a deprecation note in the script docstring and consider whether the duoMessages approach from diagnose_workflow.py could be extended to cover the reasoning extraction use case.

Accuracy Fixes for SKILL.md / references

1. Context variables — "ONLY context:goal and context:project_id" is too restrictive

The SKILL.md and references/flow-structure.md state these are the only flow-level context variables. Verified against ai-assist built-in flows (developer.yml, resolve_sast_vulnerability.yml, code_review.yml), additional runtime context vars exist:

  • context:workflow_id
  • context:project_http_url_to_repo
  • context:project_default_branch
  • context:current_date
  • context:session_url
  • context:inputs.* (via flow.inputs schema declaration)

The warning that context:issue_iid doesn't exist is correct and important — zero hits in ai-assist source. But the "ONLY" claim should be softened.

2. Missing HumanInputComponent (experimental)

references/flow-structure.md documents three component types. The ai-assist experimental spec (docs/flow_registry/experimental.md) adds HumanInputComponent for human-in-the-loop approval/rejection. It only works in ide environment (not ambient), so it can't be used in issue/MR-triggered flows. Worth mentioning with the experimental caveat.

3. validate_flow.py — incomplete tool set

The VALID_TOOLS set in the validation script is missing ~40 tools that exist in ee/lib/ai/catalog/built_in_tool_definitions.rb:

  • Vulnerability tools (12): get_vulnerability_details, list_vulnerabilities, confirm_vulnerability, dismiss_vulnerability, etc.
  • Epic tools (6): create_epic, get_epic, list_epics, etc.
  • Work Item tools (6): create_work_item, get_work_item, etc.
  • Task/Plan tools (6): create_plan, add_new_task, set_task_status, etc.
  • Other: get_current_user, get_previous_session_context, get_commit_comments, get_wiki_page, post_duo_code_review, build_review_merge_request_context, extract_lines_from_text, audit event tools

These tools are valid and would be flagged as errors by the current validator.

Validation Method

  • Tool names: Verified all 48 against ee/lib/ai/catalog/built_in_tool_definitions.rb (numeric IDs confirmed)
  • GraphQL: Verified 5 scripts against ee/app/graphql/ type definitions, resolvers, and mutations
  • YAML schema: Verified 7 claims against ai-assist/docs/flow_registry/v1.md + Pydantic models in flow_config.py
  • Tool gotchas: All 4 confirmed accurate (list_issue_notes vs list_all_merge_request_notes, run_command no git, gitlab__user_search double underscore, no GITLAB_TOKEN in shell)

Thanks for building this — the skill structure and the playbook-system architecture are genuinely useful for DAP flow development.

tag:gitlab.com,2026-03-16:5209451601 Falko Sieverding pushed to project branch main at universalamateur / Fabric 2026-03-16T17:34:00Z fsieverding Falko Sieverding [email protected]

Falko Sieverding (16b20633) at 16 Mar 17:34

chore(release): Update version to v1.4.437

... and 2 more commits

tag:gitlab.com,2026-03-16:5209451550 Falko Sieverding pushed new project tag v1.4.437 at universalamateur / Fabric 2026-03-16T17:33:59Z fsieverding Falko Sieverding [email protected]

Falko Sieverding (16b20633) at 16 Mar 17:33

chore(release): Update version to v1.4.437

tag:gitlab.com,2026-03-16:5208646416 Falko Sieverding opened issue #593728: Prompt lifecycle governance: version pinning, approval gates, and compliance alignment for enterprise DAP at GitLab.or... 2026-03-16T14:29:32Z fsieverding Falko Sieverding [email protected]

Problem

Regulated enterprises (banking, insurance, government) cannot adopt DAP foundational flows at scale without prompt lifecycle governance. Today, GitLab controls the full prompt lifecycle with no customer-facing control plane.

The current architecture explicitly decouples prompts from GitLab releases (&13024). This optimizes for iteration speed, which is the right default. But it means AI behavior on a customer's instance can change without any version change the customer can track, approve, or roll back.

For a bank under FCA/PRA/DORA, an insurance company under Solvency II, or a government agency under the EU AI Act, this is significant adoption friction that multiple regulated prospects have flagged independently.

What customers are asking for

Pattern across accounts:

  • UXR "From Switching to Sticking" study: "Trust-Through-Transparency" identified as key adoption differentiator. Users asking about model differences per agent.
  • Multiple Dedicated customers (UK banking, EMEA enterprise): security teams require sign-off on changes to AI behavior before they reach production. They already have pre-prod → prod upgrade cadences for GitLab version upgrades. Prompt changes bypass this entirely.
  • Field engagement reports (Mar 2026): customers asking about prompt tracking, told "not in roadmap"
  • Additional customer (Feb 2026): trust erosion from lack of transparency around AI feature readiness

What exists today

Capability Foundational flows Custom agents/flows
Version pinning No (always latest) Yes (AI Catalog)
Customer prompt override No Yes (custom system prompt)
Approval gate before change No N/A (customer controls)
Rollback to previous version No Yes (pin older version)

The gap is entirely on the foundational flows side.

Proposal

This issue scopes the full problem. Implementation should be phased and will likely span multiple milestones.

Phase 1: Foundational flow version pinning

One possible approach: extend AI Catalog version pinning to foundational flows. Allow instance admins to:

  • View available prompt versions for each foundational flow
  • Pin all foundational flows to a specific AI Gateway prompt version
  • Manually upgrade to a newer version when ready (explicit opt-in)

This mirrors how Dedicated customers already control GitLab version upgrades. One possible approach: the Rails monolith reads the admin's pinned version from the database and sends it as the version constraint to the AI Gateway (instead of the default ^1.0.0). The shared AIGW fleet resolves accordingly, requiring no per-tenant AIGW deployment.

Consideration: pinned versions interact with model upgrades (LEGACY_MODEL_MAPPING couples prompt versions to models) and Jinja2 {% include %} templates. A compatibility and support policy would be needed for how long old prompt versions remain supported after new ones ship.

Phase 2: Pre-prod prompt staging for Dedicated

Dedicated customers with pre-prod and prod instances already test GitLab version upgrades in pre-prod first. Extend this to prompt changes:

  • Pin prod to the current prompt version (Phase 1 prerequisite)
  • Pre-prod gets the new version first
  • Customer validates AI behavior
  • Customer updates the pin on prod when ready

Note: Dedicated instances share the same AI Gateway fleet (no per-tenant AIGW deployment). Phase 2 depends on Phase 1's version pinning mechanism, where Rails sends tenant-specific version constraints. No AIGW infrastructure change is required.

Phase 3: Compliance framework alignment

Map DAP prompt governance to regulatory frameworks:

Framework Requirement DAP coverage needed
EU AI Act Art. 13 Transparency obligations for high-risk AI systems — enables customers to meet these when using DAP in regulated contexts Prompt changelog, version visibility
EU AI Act Art. 14 Human oversight for high-risk AI — enables customers to understand, monitor, intervene Version pinning, approval gates
FCA/PRA SMCR Senior manager personal accountability — managers need governance tooling to discharge AI oversight obligations Audit trail linking output to prompt version
DORA Art. 6 ICT risk management: change management for digital services Prompt change approval workflow
SOC2 CC8.1 Change management controls Documented prompt change process
NIST AI RMF GOVERN 1.2: accountability structures for AI lifecycle End-to-end prompt governance

Phase 4: Enterprise governance integration

As DAP governance capabilities mature (&18948), prompt lifecycle governance is a natural extension point:

  • Prompt change approval workflows (multi-party sign-off)
  • Prompt diff notifications with configurable policy (block, warn, notify)
  • Prompt compliance reports for auditors
  • Integration with external GRC tools via API

The commercial packaging (included vs. add-on) is a product decision. This issue scopes the technical capability.

Competitive context

  • GitHub Copilot: claims "comprehensive governance framework" with prompt content in audit logs and explicit approval workflows. Details thin, but the messaging is ahead.
  • Neither platform fully solves runtime auditability today — linking a specific output to the exact prompt+model that produced it. First mover here wins regulated enterprise trust.
  • #498204 — Define strategy on versioning prompts for AI Gateway (CLOSED)
  • #476178 — Reducing SM Upgrades for AI Features (CLOSED, decoupled prompts)
  • &13024 — AI Gateway as Sole Access Point
  • &20278 — O1: Establish Governance Guardrails
  • &20279 — O3: Make AI Usage Transparent and Explainable
  • &20263 — O2: Enable Secure DAP Deployment at Scale
  • &18948 — AI Governance umbrella epic
  • &20231 — All DAP Settings should be covered by Audit logging
  • ai-assist#1899 — Better testing harness for cross-flow prompt dependencies
tag:gitlab.com,2026-03-16:5208646279 Falko Sieverding opened issue #593727: Include prompt version and model version in DAP audit events and session metadata at GitLab.org / GitLab 2026-03-16T14:29:31Z fsieverding Falko Sieverding [email protected]

Problem

When a DAP flow executes, the resulting audit event and session metadata do not include which prompt version or model was used. After the fact, there is no way to prove which exact instructions the AI received for a specific output.

The current G1-G17 audit logging work (&20231) covers DAP settings changes (enable/disable flows, service accounts, namespace settings). It does not cover the prompt content that governed the AI's behavior during execution.

For regulated customers under FCA/PRA/DORA, and those deploying AI in high-risk contexts under the EU AI Act, this is a compliance gap. SMCR-accountable managers at banks need this tooling to discharge their AI oversight obligations. Auditors need to trace: user action → prompt version → model → output.

Proposal

1. Prompt version in audit events

When a DAP flow step executes, include in the audit event:

  • prompt_id: e.g., sast_fp_detection_agent_prompt
  • prompt_version: e.g., 1.0.0 (the resolved version, not the constraint)
  • model_id: e.g., claude_sonnet_4_20250514 (note: foundational flows use multiple models depending on the action, not a single model)
  • model_family_resolved: e.g., base or amazon_q (the prompt family variant that was selected via fallback logic)

This requires a change to the AI Gateway response contract. Currently, the AIGW /v1/prompts/{prompt_id} endpoint returns raw model output (string or stream) with no metadata. The resolved prompt version and model ID would need to be added to response headers or a metadata envelope. Rails would then include them in the audit event payload.

2. Prompt version in session metadata

The existing DAP session view (Agent Insights) shows flow steps and their outputs. Add the prompt version and model used for each step. This gives admins a per-session view of exactly which instructions the AI followed.

3. Prompt content hash in streaming audit logs

For customers streaming audit events to a SIEM (Splunk, Elastic, etc.), include an HMAC-SHA256 hash (keyed per instance) of the rendered prompt content at LLM call time. This allows compliance teams to verify that the prompt content matches a known-good version without exposing the full prompt text in the log stream.

Note: hashing the template source file is insufficient because Jinja2 templates use {% include %}, conditionals, and variable interpolation. The same template can produce different rendered prompts depending on inputs. The hash must capture the final rendered prompt sent to the model. The existing PromptLoggingHandler (structlog) already logs the rendered prompt, so this data is available in the logging pipeline.

Implementation complexity

The prompt version is resolved at runtime within the AI Gateway's prompt registry, and the model is already selected. However, the resolved version string is currently discarded after the config object is returned. It is not persisted or included in the AIGW response.

The main work is:

  1. AI Gateway side: Modify the prompt registry to retain the resolved version string, and extend the response contract to return it (via headers or metadata envelope). For streaming responses, metadata would go in headers or a final metadata chunk.
  2. Rails side: Consume the returned metadata and include it in audit event payloads and session metadata.

This is incremental work that does not require architectural changes, but it does require coordination between the AI Gateway and Rails audit event teams.

  • &20231 — All DAP Settings should be covered by Audit logging (G1-G17)
  • #593015 — G10: Audit logging for Ai::Setting instance config changes
  • #593019 — G4: Audit logging for enabled_foundational_flows array
  • &20279 — O3: Make AI Usage Transparent and Explainable
tag:gitlab.com,2026-03-16:5208646090 Falko Sieverding opened issue #593726: Prompt change notifications and changelog for DAP foundational flows at GitLab.org / GitLab 2026-03-16T14:29:29Z fsieverding Falko Sieverding [email protected]

Problem

When GitLab updates system or functional prompts in foundational DAP flows (Code Review, SAST Vulnerability Resolution, Fix Pipeline, Duo Developer, etc.), Dedicated and Self-Managed customers have no way to see what changed.

Prompts are YAML+Jinja2 files in ai-assist/ai_gateway/prompts/definitions/ with semantic versioning. The AI Gateway deploys independently from GitLab Rails releases. Foundational flows always resolve to the latest compatible stable version. There is no customer-facing changelog, no notification, no diff.

This is significant adoption friction for regulated industries. FCA/PRA/DORA require traceability of AI behavior changes. The EU AI Act (full enforcement August 2026) requires transparency obligations for AI systems used in high-risk contexts. Customers deploying DAP in regulated environments need to know when the AI's instructions change to meet their own compliance obligations.

Evidence of prompt changes causing breakage without customer visibility

  • INC-7360 (Feb 11, 2026): shared prompt template change introduced a required variable, broke Fix Pipeline + Duo Developer flows
  • MR !218609: wrong prompt version shipped for DAP Code Review, required emergency backport to 18.8
  • Jan 13, 2026: Vertex model upgrade deployed without matching prompt file, broke /vulnerability_explain

Customer signals

  • UXR "From Switching to Sticking" study: users asking "What is the AI that DAP is using pre-tested to do? Any differences in model used for each agent?"
  • Multiple Dedicated customers (UK banking, EMEA enterprise): security teams raised prompt governance in security reviews. Governance teams need to sign off on changes to AI behavior before they reach production.
  • Field engagement reports (Mar 2026): customers asking about prompt tracking, told "not in the roadmap"
  • Additional customer (Feb 2026): "lost a lot of trust... lack of transparency around the readiness of these features"

Proposal

1. Prompt changelog per AI Gateway release

Auto-generate a changelog of prompt definition changes (new versions, modified prompts, deprecated versions) as part of the AI Gateway release process. Publish alongside existing release notes.

The ai-assist repo already has semantic versioning and a CI pipeline. A job that diffs ai_gateway/prompts/definitions/ between tags and outputs a structured changelog is low-to-medium effort. Note: the changelog should cover version transitions and affected flow names, not raw prompt diffs (to avoid exposing prompt injection defenses). It should also track LEGACY_MODEL_MAPPING changes in registry.py (model-to-prompt bindings that change AI behavior without prompt file changes) and inline flow prompts stored via InMemoryPromptRegistry.

2. Admin UI: active prompt versions

Add a section in Admin > Duo Agent Platform (or Settings > AI) showing the currently active prompt version for each foundational flow. Example:

Flow Prompt ID Active Version Changed In
Code Review review_merge_request 1.3.0 AI GW 18.9.2
SAST FP Detection sast_fp_detection_agent_prompt 1.0.0 AI GW 18.8.0
Fix Pipeline create_repository_branch 1.1.0 AI GW 18.9.1

3. Subscription mechanism

Allow admins to subscribe to prompt change notifications, similar to release notifications or security release emails. When a prompt version changes on their instance, subscribed admins receive an email or webhook with the prompt ID, old version, new version, and a link to the changelog.

Prior art

  • GitLab already publishes security release notifications via email subscription
  • GitLab releases have RSS feeds
  • The AI Transparency Center covers model selection and data handling but does not address prompt content or prompt change visibility
  • The ai-assist repo is public, so prompt diffs are technically available, but monitoring a repo is not a governance solution

Implementation notes

  • Prompt templates use Jinja2 with {% include %} cross-references. A single prompt's effective content can span multiple files. The changelog should track all files contributing to a prompt definition, not just the top-level YAML.
  • Security-sensitive prompt components (gated by CODEOWNERS with AppSec approval) should be excluded from customer-facing changelogs to avoid exposing prompt injection defenses.
  • #498204 — Define strategy on versioning prompts for AI Gateway (CLOSED, implemented internal versioning)
  • &13024 — AI Gateway as Sole Access Point (decouples prompts from releases)
  • &20279 — O3: Make AI Usage Transparent and Explainable