Skip to content

Support Gpt-5.2 in Tokenizer library#7571

Merged
tarekgh merged 1 commit intodotnet:mainfrom
tarekgh:SupportGpt-5.2Toknizer
Jan 22, 2026
Merged

Support Gpt-5.2 in Tokenizer library#7571
tarekgh merged 1 commit intodotnet:mainfrom
tarekgh:SupportGpt-5.2Toknizer

Conversation

@tarekgh
Copy link
Member

@tarekgh tarekgh commented Jan 22, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 22, 2026 19:41
@tarekgh tarekgh self-assigned this Jan 22, 2026
@tarekgh tarekgh requested a review from stephentoub January 22, 2026 19:42
@tarekgh tarekgh enabled auto-merge (squash) January 22, 2026 19:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the GPT-5.2 model to the Tokenizer library, following the same pattern as GPT-5 and GPT-5.1 models which use the O200kBase encoding.

Changes:

  • Added GPT-5.2 model support with O200kBase encoding
  • Updated test cases to include GPT-5.2 and GPT-5.2-mini variants
  • Added model mappings for both exact match and prefix matching

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs Added "gpt-5.2" and "gpt-5.2-" mappings to O200kBase encoding in both exact match dictionary and prefix array
test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs Added GPT5_2 tokenizer property, included it in O200kBase encoding test, and added test cases for "gpt-5.2" and "gpt-5.2-mini" model names

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.02%. Comparing base (adf0cec) to head (6ba8b78).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7571   +/-   ##
=======================================
  Coverage   69.02%   69.02%           
=======================================
  Files        1482     1482           
  Lines      274096   274099    +3     
  Branches    28266    28266           
=======================================
+ Hits       189191   189199    +8     
  Misses      77518    77518           
+ Partials     7387     7382    -5     
Flag Coverage Δ
Debug 69.02% <100.00%> (+<0.01%) ⬆️
production 63.31% <100.00%> (+<0.01%) ⬆️
test 89.47% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs 79.95% <100.00%> (+0.04%) ⬆️
...est/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs 99.09% <100.00%> (+<0.01%) ⬆️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tarekgh
Copy link
Member Author

tarekgh commented Jan 22, 2026

/ba-g unrelated failures

@tarekgh tarekgh merged commit 25b977e into dotnet:main Jan 22, 2026
27 of 31 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants