Tags · ochafik/llama.cpp

b8157

support permuted, remove check s0/s10 (ggml-org#19889)

Co-authored-by: Neo Zhang Jianyu <[email protected]>

Feb 26, 2026
2943210
zip
tar.gz

b8022

hexagon: fix typo in vtcm_needs_release (ggml-org#19545)

Feb 12, 2026
3bb7813
zip
tar.gz

b7587

docker : add CUDA 13.1 image build (ggml-org#18441)

* add updated cuda-new.Dockerfile for Ubuntu 24.04 compatibilty

* add cuda13 build

Dec 30, 2025
4849661
zip
tar.gz

b7540

ggml-cuda: fix regex for arch list (ggml-org#18371)

* ggml-cuda: fix regex for arch list

* make regex exact

Dec 25, 2025
85c40c9
zip
tar.gz

b7482

llama : Changing off_t to size_t for Windows (ggml-org#18204)

Dec 19, 2025
f99ef53
zip
tar.gz

b7404

preset: handle negated arg, reverse the meaning if needed (ggml-org#1…

…8041)

Dec 14, 2025
5239229
zip
tar.gz

b7274

server: strip content-length header on proxy (ggml-org#17734)

Dec 4, 2025
9d02299
zip
tar.gz

b6925

server : support unified cache across slots (ggml-org#16736)

* server : support unified context across slots

* cont : fix speculative decoding initialization

* context : fix n_ctx_per_seq computation

* server : purge slots one by one

* tests : add unified cache server tests

* llama : update per-seq context computation

* test-thread-safety : handle tiny training context of the input model

* server : fix server_tokens clear()

* server : use 4 slots + unified KV by default

* llama : add note about context size queries

* cont : update todos [no ci]

* context : do not cap the size of the context

* tests : adjust parameters to be CI friendlier

* context : add warning

Nov 2, 2025
cd5e3b5
zip
tar.gz

b6710

ggml webgpu: profiling, CI updates, reworking of command submission (g…

…gml-org#16452)

* Add profiling

* More detailed profiling

* Rework command submission to avoid global locks

* Update wait handling

* try new method of waiting on futures

* Add serializing of command submission in some cases

* Add new pool for timestamp queries and clean up logging

* Serialize command submission in CI and leave a TODO note

* Update webgpu CI

* Add myself as WebGPU codeowner

* Deadlock avoidance

* Leave WebGPU/Vulkan CI serialized

* Fix divide by 0

* Fix logic in division by inflight_threads

* Update CODEOWNERS and remove serialize submit option

Oct 7, 2025
74b8fc1
zip
tar.gz
Downloads

b6250

test-opt: allow slight inprecision (ggml-org#15503)

Aug 22, 2025
e92734d
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b8157

b8022

b7587

b7540

b7482

b7404

b7274

b6925

b6710

b6250

Tags: ochafik/llama.cpp