tag:github.com,2008:https://github.com/jeromew/llama.cpp/releases Release notes from llama.cpp 2026-04-02T09:29:11Z tag:github.com,2008:Repository/1199732395/b8635 2026-04-02T09:29:11Z b8635: Relax prefill parser to allow space. (#21240) <ul> <li> <p>Relax prefill parser to allow space.</p> </li> <li> <p>Move changes from prefix() to parser generation</p> </li> <li> <p>Only allow spaces if we're not having a pure content parser next</p> </li> </ul> pwilkin tag:github.com,2008:Repository/1199732395/b8634 2026-04-02T09:28:56Z b8634 <p>chat : add Granite 4.0 chat template with correct tool_call role mapp…</p> jesus-talavera-ibm tag:github.com,2008:Repository/1199732395/b8631 2026-04-02T07:39:00Z b8631 <p>sync : ggml</p> ggerganov tag:github.com,2008:Repository/1199732395/b8629 2026-04-02T07:08:32Z b8629 <p>sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4190615797" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21283" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21283/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21283">ggml-org#21283</a>)</p> arthw tag:github.com,2008:Repository/1199732395/b8628 2026-04-02T00:44:02Z b8628: hexagon : add cumsum op support (#21246) <ul> <li> <p>hexagon : add cumsum op support</p> </li> <li> <p>hexagon: enable dma for cumsum op</p> </li> <li> <p>Fix line-ending</p> </li> </ul> <hr> <p>Co-authored-by: Max Krasnyansky <a href="mailto:[email protected]">[email protected]</a></p> tboinovski1 tag:github.com,2008:Repository/1199732395/b8626 2026-04-01T19:54:58Z b8626 <p>opencl: fix leak in Adreno q8_0 path (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4176448893" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21212" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21212/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21212">ggml-org#21212</a>)</p> lhez tag:github.com,2008:Repository/1199732395/b8625 2026-04-01T19:32:15Z b8625 <p>server: Bypass API Key validation for WebUI static bundle assets (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1620137416" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21">ggml-org#21</a>…</p> allozaur tag:github.com,2008:Repository/1199732395/b8624 2026-04-01T19:28:19Z b8624 <p>CUDA: fix FA kernel selection logic (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4188136178" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21271" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21271/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21271">ggml-org#21271</a>)</p> JohannesGaessler tag:github.com,2008:Repository/1199732395/b8611 2026-04-01T08:10:25Z b8611 <p>ggml : fix RWKV ops thread assignment (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4178362338" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/21226" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/21226/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/21226">ggml-org#21226</a>)</p> ggerganov tag:github.com,2008:Repository/1199732395/b8610 2026-04-01T08:10:03Z b8610: ggml-cpu: fix fallback for RVV kernels without zvfh (#21157) <ul> <li> <p>ggml-cpu: refactor sgemm; fix rvv checks</p> </li> <li> <p>ggml-cpu: refactor rvv kernels; set zvfbfwma default to off</p> </li> </ul> taimur-10x