@mlapierre Thanks for this issue!
- Update documentation to reference the Cloud Run deployment scripts as the recommended approach for load testing
Can Cloud Run deployment scripts be used for load testing already? If we remove the Runway deployment now, we would only use the ability to run load tests in the CI, correct?
FYI @sarah_zywicki
Updates Sonnet 4.6 and Opus 4.6 details. Both models will be charged at the standard rate:
From Anthropic
The 1M token context window is now generally available for Claude Opus 4.6 and Sonnet 4.6 via API. Both models include the full window at standard pricing—$5/$25 per million tokens for Opus 4.6 and $3/$15 for Sonnet 4.6. Previously there were separate rate limits above and below 200K tokens. We’ve simplified this to a single rate limit for the full context window. As part of this, we’ve raised your base rate limit on Opus 4.6 to 18M to accommodate your existing long context usage with room to grow. Note this applies only to the Gitlab Production (managed account) org. As always, please reach out to request additional increases.
N/A
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Martin Wortschack (0b190f49) at 17 Mar 06:36
Update Sonnet 4.6 and Opus 4.6 model docs
It was already added to the merge train but the pipeline failed
@igor.drozdov Can you please do the maintainer review?
@junminghuang I've addressed your comments / replied. Mind taking another look?
I agree but I'm not sure if the model configuration is the best place for this information. The Anthropic announcement is already linked in the MR description
Yeah makes sense, I've updated for other occurrences in this file as well for consistency.
Martin Wortschack (d1295b99) at 16 Mar 16:24
feat: Update Sonnet 4.6 and Opus 4.6 max tokens
- Are we already able to filter llm request when doing usage billing (bill customer / not bill)? If yes, how can we usage such feature? If not, do you know if it is something already on the usage billing roadmap and how complex is that?
I don't think there is any filter logic on AIGW/DWS - we emit usage billing events for all features. There is a white-list approach on CustomersDot (CDot) to decide what features are billable. Usage billing events follow the format in https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/docs/billing_events.md#trigger-events
Imho compaction LLM requests should be identified and tagged in AIGW/DWS. Filtering them out of billing events should happen upstream in CDot, this is also consistent with how other events are excluded from usage billing.
@junminghuang Mind reviewing this change?
Martin Wortschack (9f46182f) at 16 Mar 16:08
feat: Update Sonnet 4.6 and Opus 4.6 max tokens
Now that long-context window is GA for Sonnet 4.6 and Opus 4.6 (see https://claude.com/blog/1m-context-ga), context-1m-2025-08-07 beta headers are no longer needed for Sonnet 4.6 and Opus 4.6
From Anthropic:
The 1M token context window is now generally available for Claude Opus 4.6 and Sonnet 4.6 via API. Both models include the full window at standard pricing—$5/$25 per million tokens for Opus 4.6 and $3/$15 for Sonnet 4.6. Previously there were separate rate limits above and below 200K tokens. We’ve simplified this to a single rate limit for the full context window. As part of this, we’ve raised your base rate limit on Opus 4.6 to 18M to accommodate your existing long context usage with room to grow. Note this applies only to the Gitlab Production (managed account) org. As always, please reach out to request additional increases.
We are also updating pricing multipliers in https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/15070
Numbered steps to set up and validate the change are strongly suggested.
Martin Wortschack (2697b790) at 16 Mar 07:03
feat: Update Sonnet 4.6 and Opus 4.6 max tokens
Martin Wortschack (8b78b9e4) at 16 Mar 06:58
Martin Wortschack (cc0a03c6) at 16 Mar 06:58
Merge branch 'mw-update-refinement-issue-links' into 'main'
... and 1 more commit
Updates links on https://handbook.gitlab.com/handbook/engineering/ai/ai-framework/#-backlog-refinement
CHANGEME
Please verify the check list and ensure to tick them off before the MR is merged.