tag:github.com,2008:https://github.com/jeromew/llama.cpp/releases
Release notes from llama.cpp
2026-04-02T09:29:11Z
tag:github.com,2008:Repository/1199732395/b8635
2026-04-02T09:29:11Z
b8635: Relax prefill parser to allow space. (#21240)
<ul>
<li>
<p>Relax prefill parser to allow space.</p>
</li>
<li>
<p>Move changes from prefix() to parser generation</p>
</li>
<li>
<p>Only allow spaces if we're not having a pure content parser next</p>
</li>
</ul>
pwilkin
tag:github.com,2008:Repository/1199732395/b8634
2026-04-02T09:28:56Z
b8634
<p>chat : add Granite 4.0 chat template with correct tool_call role mapp…</p>
jesus-talavera-ibm
tag:github.com,2008:Repository/1199732395/b8631
2026-04-02T07:39:00Z
b8631
<p>sync : ggml</p>
ggerganov
tag:github.com,2008:Repository/1199732395/b8629
2026-04-02T07:08:32Z
b8629
<p>sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4190615797" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21283" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21283/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21283">ggml-org#21283</a>)</p>
arthw
tag:github.com,2008:Repository/1199732395/b8628
2026-04-02T00:44:02Z
b8628: hexagon : add cumsum op support (#21246)
<ul>
<li>
<p>hexagon : add cumsum op support</p>
</li>
<li>
<p>hexagon: enable dma for cumsum op</p>
</li>
<li>
<p>Fix line-ending</p>
</li>
</ul>
<hr>
<p>Co-authored-by: Max Krasnyansky <a href="mailto:[email protected]">[email protected]</a></p>
tboinovski1
tag:github.com,2008:Repository/1199732395/b8626
2026-04-01T19:54:58Z
b8626
<p>opencl: fix leak in Adreno q8_0 path (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4176448893" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21212" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21212/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21212">ggml-org#21212</a>)</p>
lhez
tag:github.com,2008:Repository/1199732395/b8625
2026-04-01T19:32:15Z
b8625
<p>server: Bypass API Key validation for WebUI static bundle assets (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1620137416" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21">ggml-org#21</a>…</p>
allozaur
tag:github.com,2008:Repository/1199732395/b8624
2026-04-01T19:28:19Z
b8624
<p>CUDA: fix FA kernel selection logic (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4188136178" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21271" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21271/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21271">ggml-org#21271</a>)</p>
JohannesGaessler
tag:github.com,2008:Repository/1199732395/b8611
2026-04-01T08:10:25Z
b8611
<p>ggml : fix RWKV ops thread assignment (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4178362338" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21226" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21226/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21226">ggml-org#21226</a>)</p>
ggerganov
tag:github.com,2008:Repository/1199732395/b8610
2026-04-01T08:10:03Z
b8610: ggml-cpu: fix fallback for RVV kernels without zvfh (#21157)
<ul>
<li>
<p>ggml-cpu: refactor sgemm; fix rvv checks</p>
</li>
<li>
<p>ggml-cpu: refactor rvv kernels; set zvfbfwma default to off</p>
</li>
</ul>
taimur-10x