Skip to content

Tags: jeromew/llama.cpp

Tags

b8635

Toggle b8635's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Relax prefill parser to allow space. (ggml-org#21240)

* Relax prefill parser to allow space.

* Move changes from prefix() to parser generation

* Only allow spaces if we're not having a pure content parser next

b8634

Toggle b8634's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chat : add Granite 4.0 chat template with correct tool_call role mapp…

…ing (ggml-org#20804)

* chat : add Granite 4.0 chat template with correct tool_call role mapping

Introduce `LLM_CHAT_TEMPLATE_GRANITE_4_0` alongside the existing Granite
3.x template (renamed `LLM_CHAT_TEMPLATE_GRANITE_3_X`).

The Granite 4.0 Jinja template uses `<tool_call>` XML tags and maps the
`assistant_tool_call` role to `<|start_of_role|>assistant<|end_of_role|><|tool_call|>`.
Without a matching C++ handler, the fallback path emits the literal role
`assistant_tool_call` which the model does not recognize, breaking tool
calling when `--jinja` is not used.

Changes:
- Rename `LLM_CHAT_TEMPLATE_GRANITE` to `LLM_CHAT_TEMPLATE_GRANITE_3_X`
  (preserves existing 3.x behavior unchanged)
- Add `LLM_CHAT_TEMPLATE_GRANITE_4_0` enum, map entry, and handler
- Detection: `<|start_of_role|>` + (`<tool_call>` or `<tools>`) → 4.0,
  otherwise → 3.x
- Add production Granite 4.0 Jinja template
- Add tests for both 3.x and 4.0 template paths (C++ and Jinja)

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* Code review: follow standard format and use common logic in test-chat-template.cpp

* Rename custom_conversation variable for extra_conversation to give it a more meaningful name

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>

b8631

Toggle b8631's commit message
sync : ggml

b8629

Toggle b8629's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (ggml-org#2…

…1283)

b8628

Toggle b8628's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
hexagon : add cumsum op support (ggml-org#21246)

* hexagon : add cumsum op support

* hexagon: enable dma for cumsum op

* Fix line-ending

---------

Co-authored-by: Max Krasnyansky <[email protected]>

b8626

Toggle b8626's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
opencl: fix leak in Adreno q8_0 path (ggml-org#21212)

b8625

Toggle b8625's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server: Bypass API Key validation for WebUI static bundle assets (ggm…

…l-org#21269)

* fix: Bypass API Key validation for static bundle assets

* refactor: All bypassed routes in `public_endpoints`

* test: Update static assets API Key test

b8624

Toggle b8624's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: fix FA kernel selection logic (ggml-org#21271)

b8611

Toggle b8611's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : fix RWKV ops thread assignment (ggml-org#21226)

b8610

Toggle b8610's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-cpu: fix fallback for RVV kernels without zvfh (ggml-org#21157)

* ggml-cpu: refactor sgemm; fix rvv checks

* ggml-cpu: refactor rvv kernels; set zvfbfwma default to off