Junming Huang activity https://gitlab.com/junminghuang 2026-03-16T16:39:25Z tag:gitlab.com,2026-03-16:5209247036 Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T16:39:25Z junminghuang Junming Huang

LGTM!

tag:gitlab.com,2026-03-16:5209246634 Junming Huang approved merge request !4888: feat: Update Sonnet 4.6 and Opus 4.6 max tokens at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / C... 2026-03-16T16:39:19Z junminghuang Junming Huang

What does this merge request do and why?

Now that long-context window is GA for Sonnet 4.6 and Opus 4.6 (see https://claude.com/blog/1m-context-ga), context-1m-2025-08-07 beta headers are no longer needed for Sonnet 4.6 and Opus 4.6

From Anthropic:

The 1M token context window is now generally available for Claude Opus 4.6 and Sonnet 4.6 via API. Both models include the full window at standard pricing—$5/$25 per million tokens for Opus 4.6 and $3/$15 for Sonnet 4.6. Previously there were separate rate limits above and below 200K tokens. We’ve simplified this to a single rate limit for the full context window. As part of this, we’ve raised your base rate limit on Opus 4.6 to 18M to accommodate your existing long context usage with room to grow. Note this applies only to the Gitlab Production (managed account) org. As always, please reach out to request additional increases.

We are also updating pricing multipliers in https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/15070

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
  • If this change requires executor implementation: verified that issues/MRs exist for both Go executor and Node executor or confirmed that changes are backward-compatible and don't break existing executor functionality.
tag:gitlab.com,2026-03-16:5209246607 Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T16:39:18Z junminghuang Junming Huang

@wortschi fair point. Should be fine if we already have context info in the MR description.

tag:gitlab.com,2026-03-16:5209170718 Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T16:19:50Z junminghuang Junming Huang

@romaneisner would you be able to help review it?

tag:gitlab.com,2026-03-16:5209159935 Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T16:17:18Z junminghuang Junming Huang

Add two comments.

tag:gitlab.com,2026-03-16:5209159714 Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T16:17:14Z junminghuang Junming Huang

suggestion (non-blocking): it would be great to add anthropic announcement or links to raise awareness that this feature is exclusively support to Opus and Sonnet 4.6 only.

tag:gitlab.com,2026-03-16:5209159689 Junming Huang commented on merge request !4888 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T16:17:14Z junminghuang Junming Huang

suggestion (nit-picking): 1_000_000 format generally has a better readability, I don't need to count the number of 0 😀.

tag:gitlab.com,2026-03-16:5209021422 Junming Huang commented on issue #2038 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T15:47:53Z junminghuang Junming Huang

Hi @bastirehm @wortschi @timzallmann , now the Claude 1m context window GA at a standard pricing, we need to add support on ai-gateway to leverage it. cc @achueshev

tag:gitlab.com,2026-03-16:5208989512 Junming Huang opened issue #2038: Support 1M context window for Claude Opus 4.6 and Sonnet 4.6 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) ... 2026-03-16T15:41:17Z junminghuang Junming Huang

Problem to solve

As a GitLab Duo user, I want to leverage the full 1M context window now available for Claude Opus 4.6 and Sonnet 4.6, so I can work with larger codebases and documents in a single conversation without context truncation.

Anthropic announced general availability of 1M context window (March 13, 2026) with:

  • Standard pricing across full 1M window (no long-context premium)
  • Full rate limits at every context length
  • No beta header required for requests over 200K tokens

Proposal

Update AI Gateway to support the full 1M context window for Claude Opus 4.6 and Sonnet 4.6 models.

Implementation Plan

  1. Update Model Configuration

    • Update models.yml to reflect 1M (1,000,000) token context window for Claude Opus 4.6 and Sonnet 4.6
    • Remove any 200K context limit restrictions
  2. Remove Beta Header Requirements

    • Identify and remove any anthropic-beta header logic for long-context requests
    • Requests over 200K tokens should work automatically
  3. Testing

    • Add/update tests for 1M context window support
    • Verify no regression in existing functionality

Further details

Benefits:

  • Users can process entire codebases in a single session
  • Improved accuracy for large document analysis (Opus 4.6 scores 78.3% on MRCR v2)
  • No additional cost for long-context requests

Acceptance Criteria:

  • Claude Opus 4.6 and Sonnet 4.6 accept up to 1M tokens in context
  • No beta headers required for long-context requests
  • All existing tests pass
tag:gitlab.com,2026-03-16:5208334181 Junming Huang commented on issue #1947 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T13:26:50Z junminghuang Junming Huang

Hey @bluenoodles, do you have any progress or hit any blockers on this? This feature turns out needed to be prioritized soon. If you are unable to deliver, feel free to let us know. We could also suggest other issues that you can work on.

tag:gitlab.com,2026-03-16:5208313535 Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T13:23:20Z junminghuang Junming Huang

@bcardoso- would you be able to do the maintainer review of this short adding compaction doc MR?

tag:gitlab.com,2026-03-16:5208308897 Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T13:22:27Z junminghuang Junming Huang

Addressed

tag:gitlab.com,2026-03-16:5208307633 Junming Huang pushed to project branch docs/compaction-documentation at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Ga... 2026-03-16T13:22:12Z junminghuang Junming Huang

Junming Huang (4bb1219b) at 16 Mar 13:22

fix: update doc yaml example

tag:gitlab.com,2026-03-16:5208299846 Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T13:20:35Z junminghuang Junming Huang

@GitLabDuo that should be addressed by looking at the yaml example and code itself.

tag:gitlab.com,2026-03-16:5208293049 Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T13:19:21Z junminghuang Junming Huang

@GitLabDuo should be fine.

tag:gitlab.com,2026-03-16:5208278521 Junming Huang commented on issue #2019 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T13:16:31Z junminghuang Junming Huang

Hi @alejandro @fpiva, we are adding conversation history compaction feature to DAP which requires making llm call that is not triggered by user. I have the following questions hope you can help:

  1. Are we already able to filter llm request when doing usage billing (bill customer / not bill)? If yes, how can we usage such feature? If not, do you know if it is something already on the usage billing roadmap and how complex is that?
  2. Currently, the compaction llm call is using model from the workflow with hard coded system prompt template. I would like to migrate to use prompt registry, does it support to get the raw llm model object (without tool binding)? Is there any particular things I need to pay attention to?

cc @bastirehm @wortschi

tag:gitlab.com,2026-03-16:5208230447 Junming Huang commented on merge request !4897 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T13:06:45Z junminghuang Junming Huang

fixed

tag:gitlab.com,2026-03-16:5208219906 Junming Huang pushed to project branch docs/compaction-documentation at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Ga... 2026-03-16T13:04:32Z junminghuang Junming Huang

Junming Huang (790ef606) at 16 Mar 13:04

fix: align compaction docs with actual implementation

tag:gitlab.com,2026-03-16:5208155807 Junming Huang pushed to project branch docs/compaction-documentation at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Ga... 2026-03-16T12:50:55Z junminghuang Junming Huang

Junming Huang (9f07b662) at 16 Mar 12:50

fix: address review feedback for compaction docs

tag:gitlab.com,2026-03-16:5208121127 Junming Huang commented on merge request !4867 at GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway 2026-03-16T12:43:38Z junminghuang Junming Huang

@eduardobonet can you help do the initial review of this small MR?