Junming Huang activity

Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T16:39:25Z

LGTM!

Junming Huang approved merge request !4888: feat: Update Sonnet 4.6 and Opus 4.6 max tokens at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / C...

2026-03-16T16:39:19Z

What does this merge request do and why?

Now that long-context window is GA for Sonnet 4.6 and Opus 4.6 (see https://claude.com/blog/1m-context-ga), context-1m-2025-08-07 beta headers are no longer needed for Sonnet 4.6 and Opus 4.6

From Anthropic:

The 1M token context window is now generally available for Claude Opus 4.6 and Sonnet 4.6 via API. Both models include the full window at standard pricing—$5/$25 per million tokens for Opus 4.6 and $3/$15 for Sonnet 4.6. Previously there were separate rate limits above and below 200K tokens. We’ve simplified this to a single rate limit for the full context window. As part of this, we’ve raised your base rate limit on Opus 4.6 to 18M to accommodate your existing long context usage with room to grow. Note this applies only to the Gitlab Production (managed account) org. As always, please reach out to request additional increases.

We are also updating pricing multipliers in https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/15070

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.
If this change requires executor implementation: verified that issues/MRs exist for both Go executor and Node executor or confirmed that changes are backward-compatible and don't break existing executor functionality.

Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T16:39:18Z

@wortschi fair point. Should be fine if we already have context info in the MR description.

Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T16:19:50Z

@romaneisner would you be able to help review it?

Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T16:17:18Z

Add two comments.

Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T16:17:14Z

suggestion (non-blocking): it would be great to add anthropic announcement or links to raise awareness that this feature is exclusively support to Opus and Sonnet 4.6 only.

Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T16:17:14Z

suggestion (nit-picking): 1_000_000 format generally has a better readability, I don't need to count the number of 0 😀.

Junming Huang commented on issue #2038 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T15:47:53Z

Hi @bastirehm @wortschi @timzallmann , now the Claude 1m context window GA at a standard pricing, we need to add support on ai-gateway to leverage it. cc @achueshev

Junming Huang opened issue #2038: Support 1M context window for Claude Opus 4.6 and Sonnet 4.6 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) ...

2026-03-16T15:41:17Z

Problem to solve

As a GitLab Duo user, I want to leverage the full 1M context window now available for Claude Opus 4.6 and Sonnet 4.6, so I can work with larger codebases and documents in a single conversation without context truncation.

Anthropic announced general availability of 1M context window (March 13, 2026) with:

Standard pricing across full 1M window (no long-context premium)
Full rate limits at every context length
No beta header required for requests over 200K tokens

Proposal

Update AI Gateway to support the full 1M context window for Claude Opus 4.6 and Sonnet 4.6 models.

Implementation Plan

Update Model Configuration
- Update models.yml to reflect 1M (1,000,000) token context window for Claude Opus 4.6 and Sonnet 4.6
- Remove any 200K context limit restrictions
Remove Beta Header Requirements
- Identify and remove any anthropic-beta header logic for long-context requests
- Requests over 200K tokens should work automatically
Testing
- Add/update tests for 1M context window support
- Verify no regression in existing functionality

Further details

Benefits:

Users can process entire codebases in a single session
Improved accuracy for large document analysis (Opus 4.6 scores 78.3% on MRCR v2)
No additional cost for long-context requests

Acceptance Criteria:

Claude Opus 4.6 and Sonnet 4.6 accept up to 1M tokens in context
No beta headers required for long-context requests
All existing tests pass

Links / references

Anthropic announcement: https://claude.com/blog/1m-context-ga

Junming Huang commented on issue #1947 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T13:26:50Z

Hey @bluenoodles, do you have any progress or hit any blockers on this? This feature turns out needed to be prioritized soon. If you are unable to deliver, feel free to let us know. We could also suggest other issues that you can work on.

Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T13:23:20Z

@bcardoso- would you be able to do the maintainer review of this short adding compaction doc MR?

Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T13:22:27Z

Addressed

Junming Huang pushed to project branch docs/compaction-documentation at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Ga...

2026-03-16T13:22:12Z

Junming Huang (4bb1219b) at 16 Mar 13:22

fix: update doc yaml example

Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T13:20:35Z

@GitLabDuo that should be addressed by looking at the yaml example and code itself.

Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T13:19:21Z

@GitLabDuo should be fine.

Junming Huang commented on issue #2019 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T13:16:31Z

Hi @alejandro @fpiva, we are adding conversation history compaction feature to DAP which requires making llm call that is not triggered by user. I have the following questions hope you can help:

Are we already able to filter llm request when doing usage billing (bill customer / not bill)? If yes, how can we usage such feature? If not, do you know if it is something already on the usage billing roadmap and how complex is that?
Currently, the compaction llm call is using model from the workflow with hard coded system prompt template. I would like to migrate to use prompt registry, does it support to get the raw llm model object (without tool binding)? Is there any particular things I need to pay attention to?

cc @bastirehm @wortschi

Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T13:06:45Z

fixed

Junming Huang pushed to project branch docs/compaction-documentation at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Ga...

2026-03-16T13:04:32Z

Junming Huang (790ef606) at 16 Mar 13:04

fix: align compaction docs with actual implementation

Junming Huang pushed to project branch docs/compaction-documentation at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Ga...

2026-03-16T12:50:55Z

Junming Huang (9f07b662) at 16 Mar 12:50

fix: address review feedback for compaction docs

Junming Huang commented on merge request !4867 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

2026-03-16T12:43:38Z

@eduardobonet can you help do the initial review of this small MR?