Tags · jeromew/llama.cpp

b8635

Relax prefill parser to allow space. (ggml-org#21240)

* Relax prefill parser to allow space.

* Move changes from prefix() to parser generation

* Only allow spaces if we're not having a pure content parser next

Apr 2, 2026
e15efe0
zip
tar.gz

b8634

chat : add Granite 4.0 chat template with correct tool_call role mapp…

…ing (ggml-org#20804)

* chat : add Granite 4.0 chat template with correct tool_call role mapping

Introduce `LLM_CHAT_TEMPLATE_GRANITE_4_0` alongside the existing Granite
3.x template (renamed `LLM_CHAT_TEMPLATE_GRANITE_3_X`).

The Granite 4.0 Jinja template uses `<tool_call>` XML tags and maps the
`assistant_tool_call` role to `<|start_of_role|>assistant<|end_of_role|><|tool_call|>`.
Without a matching C++ handler, the fallback path emits the literal role
`assistant_tool_call` which the model does not recognize, breaking tool
calling when `--jinja` is not used.

Changes:
- Rename `LLM_CHAT_TEMPLATE_GRANITE` to `LLM_CHAT_TEMPLATE_GRANITE_3_X`
  (preserves existing 3.x behavior unchanged)
- Add `LLM_CHAT_TEMPLATE_GRANITE_4_0` enum, map entry, and handler
- Detection: `<|start_of_role|>` + (`<tool_call>` or `<tools>`) → 4.0,
  otherwise → 3.x
- Add production Granite 4.0 Jinja template
- Add tests for both 3.x and 4.0 template paths (C++ and Jinja)

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* Code review: follow standard format and use common logic in test-chat-template.cpp

* Rename custom_conversation variable for extra_conversation to give it a more meaningful name

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>

Apr 2, 2026
6137c32
zip
tar.gz

b8631

sync : ggml

Apr 2, 2026
dae2bf4
zip
tar.gz

b8629

sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (ggml-org#2…

…1283)

Apr 2, 2026
4888137
zip
tar.gz

b8628

hexagon : add cumsum op support (ggml-org#21246)

* hexagon : add cumsum op support

* hexagon: enable dma for cumsum op

* Fix line-ending

---------

Co-authored-by: Max Krasnyansky <[email protected]>

Apr 2, 2026
fbd441c
zip
tar.gz

b8626

opencl: fix leak in Adreno q8_0 path (ggml-org#21212)

Apr 1, 2026
95a6eba
zip
tar.gz

b8625

server: Bypass API Key validation for WebUI static bundle assets (ggm…

…l-org#21269)

* fix: Bypass API Key validation for static bundle assets

* refactor: All bypassed routes in `public_endpoints`

* test: Update static assets API Key test

Apr 1, 2026
12dbf1d
zip
tar.gz

b8624

CUDA: fix FA kernel selection logic (ggml-org#21271)

Apr 1, 2026
86221cf
zip
tar.gz

b8611

ggml : fix RWKV ops thread assignment (ggml-org#21226)

Apr 1, 2026
d43375f
zip
tar.gz

b8610

ggml-cpu: fix fallback for RVV kernels without zvfh (ggml-org#21157)

* ggml-cpu: refactor sgemm; fix rvv checks

* ggml-cpu: refactor rvv kernels; set zvfbfwma default to off

Apr 1, 2026
2b86e5c
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b8635

b8634

b8631

b8629

b8628

b8626

b8625

b8624

b8611

b8610

Tags: jeromew/llama.cpp