Skip to content

Relax prefill parser to allow space.#21240

Merged
pwilkin merged 3 commits intoggml-org:masterfrom
pwilkin:relax-prefill-parser
Apr 2, 2026
Merged

Relax prefill parser to allow space.#21240
pwilkin merged 3 commits intoggml-org:masterfrom
pwilkin:relax-prefill-parser

Conversation

@pwilkin
Copy link
Copy Markdown
Member

@pwilkin pwilkin commented Mar 31, 2026

Overview

As in title.

Additional information

Prefill parser was strictly requiring the reasoning marker at the very start of the message, which interfered with models that liked to insert eg. a newline there.

Requirements

@pwilkin pwilkin requested a review from a team as a code owner March 31, 2026 22:04
@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Mar 31, 2026

Fixes #20356

@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Mar 31, 2026

We should move the space to every parser creation that needs this. Not every model requires it.

@aviallon
Copy link
Copy Markdown
Contributor

aviallon commented Apr 1, 2026

@aldehir wouldn't that clutter the code without measurable performance benefits?

@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Apr 1, 2026

That's why I added operator<<(), but it is sparingly used throughout much of the new parsers. The prefix() parser should be just that, a prefix of a literal up to a given substring with no surprises.

@alvaroslm
Copy link
Copy Markdown

alvaroslm commented Apr 1, 2026

I tried GLM 4.7 flash and Qwen 3.5 27b with reasoning and tools. Seemed to work fine, no loose think or /think tags spotted, no chat template errors. I noticed new reasoning budget messages that weren't there a few weeks ago (but this is unrelated to this pr):

reasoning-budget: activated, budget=2147483647 tokens
slot init_sampler: id 0 | task 4261 | init sampler, took 10.38 ms, tokens: text = 83646, total = 83646
slot update_slots: id 0 | task 4261 | prompt processing done, n_tokens = 83646, batch.n_tokens = 4
slot update_slots: id 0 | task 4261 | created context checkpoint 20 of 32 (pos_min = 83641, pos_max = 83641, n_tokens = 83642, size = 149.626 MiB)
reasoning-budget: deactivated (natural end)

@pwilkin
Copy link
Copy Markdown
Member Author

pwilkin commented Apr 1, 2026

Yeah, I keep on forgetting about <<... fixed it up the proper way.

@pwilkin pwilkin merged commit e15efe0 into ggml-org:master Apr 2, 2026
46 checks passed
slartibardfast pushed a commit to slartibardfast/llama.cpp that referenced this pull request Apr 12, 2026
* Relax prefill parser to allow space.

* Move changes from prefix() to parser generation

* Only allow spaces if we're not having a pure content parser next
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* Relax prefill parser to allow space.

* Move changes from prefix() to parser generation

* Only allow spaces if we're not having a pure content parser next
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants