I integrated your dap-agent skill into my Claude Code harness and ran a deep validation against GitLab source code (gitlab-org/gitlab master + ai-assist flow registry v1.md). 4 parallel research agents checked tool names, GraphQL queries, YAML schema claims, and script correctness.
Overall: excellent work. 48/48 tool names confirmed, 4/4 gotcha claims accurate, YAML schema mostly correct. Found 2 bugs and 1 deprecation worth fixing.
enable_flow.py — wrong GraphQL argument name (CRITICAL)
File: dap-agent/scripts/enable_flow.py
The GROUP_ENABLE and PROJECT_ENABLE mutations use catalogItemId as the argument name:
mutation($catalogItemId: AiCatalogItemID!, ...) {
aiCatalogItemConsumerCreate(input: {
catalogItemId: $catalogItemId
Source says: The actual argument is itemId, not catalogItemId.
Verified in ee/app/graphql/mutations/ai/catalog/item_consumer/create.rb:
argument :item_id, ::Types::GlobalIDType[::Ai::Catalog::Item]
Fix: Rename catalogItemId → itemId in both mutations and the Python variable dict.
enable_flow.py — wrong triggerTypes variable type
File: dap-agent/scripts/enable_flow.py
The PROJECT_ENABLE mutation declares:
$triggerTypes: [AiCatalogItemConsumerTriggerTypeEnum!]
Source says: The actual argument type is [GraphQL::Types::String] (plain strings), not an enum.
Fix: Change to $triggerTypes: [String!]
extract_reasoning.py uses deprecated checkpoint field
File: dap-agent/scripts/extract_reasoning.py
The script queries duoWorkflowEvents for the checkpoint field. This field is deprecated since milestone 18.7.
From ee/app/graphql/types/ai/duo_workflows/workflow_event_type.rb:
field :checkpoint, ... deprecated: { reason: 'Checkpoints are big & contain internal langgraph details', milestone: '18.7' }
Still functional today but at risk of removal. The duoMessages field (which diagnose_workflow.py uses) is the intended replacement for most diagnostic use cases, though it lacks the full conversation_history with AIMessage reasoning that checkpoint provides.
Suggestion: Add a deprecation note in the script docstring and consider whether the duoMessages approach from diagnose_workflow.py could be extended to cover the reasoning extraction use case.
context:goal and context:project_id" is too restrictive
The SKILL.md and references/flow-structure.md state these are the only flow-level context variables. Verified against ai-assist built-in flows (developer.yml, resolve_sast_vulnerability.yml, code_review.yml), additional runtime context vars exist:
context:workflow_idcontext:project_http_url_to_repocontext:project_default_branchcontext:current_datecontext:session_urlcontext:inputs.* (via flow.inputs schema declaration)The warning that context:issue_iid doesn't exist is correct and important — zero hits in ai-assist source. But the "ONLY" claim should be softened.
HumanInputComponent (experimental)
references/flow-structure.md documents three component types. The ai-assist experimental spec (docs/flow_registry/experimental.md) adds HumanInputComponent for human-in-the-loop approval/rejection. It only works in ide environment (not ambient), so it can't be used in issue/MR-triggered flows. Worth mentioning with the experimental caveat.
validate_flow.py — incomplete tool set
The VALID_TOOLS set in the validation script is missing ~40 tools that exist in ee/lib/ai/catalog/built_in_tool_definitions.rb:
get_vulnerability_details, list_vulnerabilities, confirm_vulnerability, dismiss_vulnerability, etc.create_epic, get_epic, list_epics, etc.create_work_item, get_work_item, etc.create_plan, add_new_task, set_task_status, etc.get_current_user, get_previous_session_context, get_commit_comments, get_wiki_page, post_duo_code_review, build_review_merge_request_context, extract_lines_from_text, audit event toolsThese tools are valid and would be flagged as errors by the current validator.
ee/lib/ai/catalog/built_in_tool_definitions.rb (numeric IDs confirmed)ee/app/graphql/ type definitions, resolvers, and mutationsai-assist/docs/flow_registry/v1.md + Pydantic models in flow_config.py
list_issue_notes vs list_all_merge_request_notes, run_command no git, gitlab__user_search double underscore, no GITLAB_TOKEN in shell)Thanks for building this — the skill structure and the playbook-system architecture are genuinely useful for DAP flow development.
Falko Sieverding (16b20633) at 16 Mar 17:34
chore(release): Update version to v1.4.437
... and 2 more commits
Falko Sieverding (16b20633) at 16 Mar 17:33
chore(release): Update version to v1.4.437
Regulated enterprises (banking, insurance, government) cannot adopt DAP foundational flows at scale without prompt lifecycle governance. Today, GitLab controls the full prompt lifecycle with no customer-facing control plane.
The current architecture explicitly decouples prompts from GitLab releases (&13024). This optimizes for iteration speed, which is the right default. But it means AI behavior on a customer's instance can change without any version change the customer can track, approve, or roll back.
For a bank under FCA/PRA/DORA, an insurance company under Solvency II, or a government agency under the EU AI Act, this is significant adoption friction that multiple regulated prospects have flagged independently.
Pattern across accounts:
| Capability | Foundational flows | Custom agents/flows |
|---|---|---|
| Version pinning | No (always latest) | Yes (AI Catalog) |
| Customer prompt override | No | Yes (custom system prompt) |
| Approval gate before change | No | N/A (customer controls) |
| Rollback to previous version | No | Yes (pin older version) |
The gap is entirely on the foundational flows side.
This issue scopes the full problem. Implementation should be phased and will likely span multiple milestones.
One possible approach: extend AI Catalog version pinning to foundational flows. Allow instance admins to:
This mirrors how Dedicated customers already control GitLab version upgrades. One possible approach: the Rails monolith reads the admin's pinned version from the database and sends it as the version constraint to the AI Gateway (instead of the default ^1.0.0). The shared AIGW fleet resolves accordingly, requiring no per-tenant AIGW deployment.
Consideration: pinned versions interact with model upgrades (LEGACY_MODEL_MAPPING couples prompt versions to models) and Jinja2 {% include %} templates. A compatibility and support policy would be needed for how long old prompt versions remain supported after new ones ship.
Dedicated customers with pre-prod and prod instances already test GitLab version upgrades in pre-prod first. Extend this to prompt changes:
Note: Dedicated instances share the same AI Gateway fleet (no per-tenant AIGW deployment). Phase 2 depends on Phase 1's version pinning mechanism, where Rails sends tenant-specific version constraints. No AIGW infrastructure change is required.
Map DAP prompt governance to regulatory frameworks:
| Framework | Requirement | DAP coverage needed |
|---|---|---|
| EU AI Act Art. 13 | Transparency obligations for high-risk AI systems — enables customers to meet these when using DAP in regulated contexts | Prompt changelog, version visibility |
| EU AI Act Art. 14 | Human oversight for high-risk AI — enables customers to understand, monitor, intervene | Version pinning, approval gates |
| FCA/PRA SMCR | Senior manager personal accountability — managers need governance tooling to discharge AI oversight obligations | Audit trail linking output to prompt version |
| DORA Art. 6 | ICT risk management: change management for digital services | Prompt change approval workflow |
| SOC2 CC8.1 | Change management controls | Documented prompt change process |
| NIST AI RMF | GOVERN 1.2: accountability structures for AI lifecycle | End-to-end prompt governance |
As DAP governance capabilities mature (&18948), prompt lifecycle governance is a natural extension point:
The commercial packaging (included vs. add-on) is a product decision. This issue scopes the technical capability.
ai-assist#1899 — Better testing harness for cross-flow prompt dependenciesWhen a DAP flow executes, the resulting audit event and session metadata do not include which prompt version or model was used. After the fact, there is no way to prove which exact instructions the AI received for a specific output.
The current G1-G17 audit logging work (&20231) covers DAP settings changes (enable/disable flows, service accounts, namespace settings). It does not cover the prompt content that governed the AI's behavior during execution.
For regulated customers under FCA/PRA/DORA, and those deploying AI in high-risk contexts under the EU AI Act, this is a compliance gap. SMCR-accountable managers at banks need this tooling to discharge their AI oversight obligations. Auditors need to trace: user action → prompt version → model → output.
When a DAP flow step executes, include in the audit event:
prompt_id: e.g., sast_fp_detection_agent_prompt
prompt_version: e.g., 1.0.0 (the resolved version, not the constraint)model_id: e.g., claude_sonnet_4_20250514 (note: foundational flows use multiple models depending on the action, not a single model)model_family_resolved: e.g., base or amazon_q (the prompt family variant that was selected via fallback logic)This requires a change to the AI Gateway response contract. Currently, the AIGW /v1/prompts/{prompt_id} endpoint returns raw model output (string or stream) with no metadata. The resolved prompt version and model ID would need to be added to response headers or a metadata envelope. Rails would then include them in the audit event payload.
The existing DAP session view (Agent Insights) shows flow steps and their outputs. Add the prompt version and model used for each step. This gives admins a per-session view of exactly which instructions the AI followed.
For customers streaming audit events to a SIEM (Splunk, Elastic, etc.), include an HMAC-SHA256 hash (keyed per instance) of the rendered prompt content at LLM call time. This allows compliance teams to verify that the prompt content matches a known-good version without exposing the full prompt text in the log stream.
Note: hashing the template source file is insufficient because Jinja2 templates use {% include %}, conditionals, and variable interpolation. The same template can produce different rendered prompts depending on inputs. The hash must capture the final rendered prompt sent to the model. The existing PromptLoggingHandler (structlog) already logs the rendered prompt, so this data is available in the logging pipeline.
The prompt version is resolved at runtime within the AI Gateway's prompt registry, and the model is already selected. However, the resolved version string is currently discarded after the config object is returned. It is not persisted or included in the AIGW response.
The main work is:
This is incremental work that does not require architectural changes, but it does require coordination between the AI Gateway and Rails audit event teams.
When GitLab updates system or functional prompts in foundational DAP flows (Code Review, SAST Vulnerability Resolution, Fix Pipeline, Duo Developer, etc.), Dedicated and Self-Managed customers have no way to see what changed.
Prompts are YAML+Jinja2 files in ai-assist/ai_gateway/prompts/definitions/ with semantic versioning. The AI Gateway deploys independently from GitLab Rails releases. Foundational flows always resolve to the latest compatible stable version. There is no customer-facing changelog, no notification, no diff.
This is significant adoption friction for regulated industries. FCA/PRA/DORA require traceability of AI behavior changes. The EU AI Act (full enforcement August 2026) requires transparency obligations for AI systems used in high-risk contexts. Customers deploying DAP in regulated environments need to know when the AI's instructions change to meet their own compliance obligations.
/vulnerability_explain
Auto-generate a changelog of prompt definition changes (new versions, modified prompts, deprecated versions) as part of the AI Gateway release process. Publish alongside existing release notes.
The ai-assist repo already has semantic versioning and a CI pipeline. A job that diffs ai_gateway/prompts/definitions/ between tags and outputs a structured changelog is low-to-medium effort. Note: the changelog should cover version transitions and affected flow names, not raw prompt diffs (to avoid exposing prompt injection defenses). It should also track LEGACY_MODEL_MAPPING changes in registry.py (model-to-prompt bindings that change AI behavior without prompt file changes) and inline flow prompts stored via InMemoryPromptRegistry.
Add a section in Admin > Duo Agent Platform (or Settings > AI) showing the currently active prompt version for each foundational flow. Example:
| Flow | Prompt ID | Active Version | Changed In |
|---|---|---|---|
| Code Review | review_merge_request |
1.3.0 | AI GW 18.9.2 |
| SAST FP Detection | sast_fp_detection_agent_prompt |
1.0.0 | AI GW 18.8.0 |
| Fix Pipeline | create_repository_branch |
1.1.0 | AI GW 18.9.1 |
Allow admins to subscribe to prompt change notifications, similar to release notifications or security release emails. When a prompt version changes on their instance, subscribed admins receive an email or webhook with the prompt ID, old version, new version, and a link to the changelog.
ai-assist repo is public, so prompt diffs are technically available, but monitoring a repo is not a governance solution{% include %} cross-references. A single prompt's effective content can span multiple files. The changelog should track all files contributing to a prompt definition, not just the top-level YAML.