@mikolaj_wawrzyniak Great catch! I have created a follow-up to implement this: #2045
Numbered steps to set up and validate the change are strongly suggested.
Closes #1994
Tim Morriss (6b16fa24) at 17 Mar 13:06
Description by Duo
This merge request adds error handling and failure recovery to a workflow system. The main changes include:
New Abort Mechanism: Added an "AbortComponent" that can terminate workflows when they encounter unrecoverable errors, setting the status to "ERROR" instead of letting them hang indefinitely.
Enhanced Tool Error Handling: Improved how the system handles situations where AI agents can't use their tools properly - either because no tools are available, authentication fails, or the agent fails to generate proper tool calls. Instead of silently failing, it now provides clear feedback and can abort after multiple failed attempts.
Workflow Routing Updates: Modified the developer workflow to route failed operations to the new abort component instead of just ending, providing better visibility into what went wrong.
Better Error Messages: Added more descriptive error messages that help distinguish between temporary issues (like formatting errors) that should be retried versus permanent blockers (like missing credentials) that should cause the workflow to abort.
Test Coverage: Added comprehensive tests to ensure the new error handling works correctly in various failure scenarios.
The overall goal is to make workflows more robust by properly handling failures instead of getting stuck, while providing clear feedback about what went wrong.
This reverts merge request !4881
fix_pipeline flow in master/main branchdeveloper flowOneOffComponent
An example scenario - Update DUO_CLI_VERSION = "8.64.0" in https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/services/ai/duo_workflows/start_workflow_service.rb#L7
Add this agent-config.yml to your project
image: python:3.11-alpine
setup_script:
- apk add --update git nodejs npm
Run developer flow, it should retry 3 times before exiting
Hey @ssuman3 was able to test locally and LGTM!
Question (non-blocking): Do we need to update any documentation to include the new abort component? Maybe v1.md?
@wortschi After moving to dev-ai-research-0e2f8974 I'm getting this error:
litellm.llms.anthropic.common_utils.AnthropicError:
[
{
"error": {
"code": 403,
"message": "Permission 'aiplatform.endpoints.predict' denied on resource '//aiplatform.googleapis.com/projects/dev-ai-research-0e2f8974/locations/global/publishers/anthropic/models/claude-sonnet-4-5@20250929' (or it may not exist).",
"status": "PERMISSION_DENIED",
"details": [
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"reason": "IAM_PERMISSION_DENIED",
"domain": "aiplatform.googleapis.com",
"metadata": {
"permission": "aiplatform.endpoints.predict",
"resource": "projects/dev-ai-research-0e2f8974/locations/global/publishers/anthropic/models/claude-sonnet-4-5@20250929"
}
}
]
}
}
]
Makes sense, thanks for the clarification!
Hi @thomas-schmidt great work on this! I think this is something that we should definitely implement.
I have a few questions and suggestions if you wouldn't mind taking a look?
Question: I'm not sure I understand the inclusion of "auto" here. OneOffComponent has a hard edge to its tool node. Would there be cases where we wouldn't want it to use a tool?
Suggestion: We can remove the checking for ID and version here as it's already validated in validate_and_resolve_response_schema.
if self._response_schema is not None:
# Add output keys for each field in the schema
field_outputs = []
for field_name in self._response_schema.model_fields.keys():Suggestion: Could we use if statement instead of assert?
Suggestion: Perhaps we could use an if instead of an assert here?
Question: So essentially we are allowing the Agent to decide whether it needs to use tools rather than forcing it to use tools (including the AgentFinalOutput).
Is it likely at all that the agent will just output with text rather than using a tool?
@rcoleman-gitlab sounds good, happy to include it in there. It actually may make the validation easier as we have Pydantic models for the tool inputs.
Also will this potentially cause a breaking change for your force_internal parsing? As the name will change (force_internal to internal), if we move to using the parameter approach described by @romaneisner.