Tags · erickwill/llama.cpp-Stock

b8913

fix(shader): handle the buffer aliasing for rms fuse (ggml-org#22266)

Apr 23, 2026
e5f070a
zip
tar.gz

b8850

CUDA: refactor mma data loading for AMD (ggml-org#22051)

* CUDA: refactor mma data loading for AMD

* fix CDNA MMQ occupancy

* fix CDNA3 mma

* fix RDNA3 compile

Apr 19, 2026
4eac5b4
zip
tar.gz

b8785

vulkan: Support GGML_TYPE_NVFP4 (ggml-org#21455)

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For
mul_mat, it does not add support for the dp4/q8_1 path, it's all via
fp16/fp32.

Apr 14, 2026
6a6780a
zip
tar.gz

b8763

CUDA: skip compilation of superfluous FA kernels (ggml-org#21768)

Apr 11, 2026
ff5ef82
zip
tar.gz

b8720

CUDA: also store `node->src->data` ptrs for equality check (ggml-org#…

…21635)

* CUDA: also store node->src->data ptrs for equality check

* address review comments

Apr 8, 2026
d12cc3d
zip
tar.gz

b8693

server : fix restore for checkpoints with pos_min == 0 (ggml-org#21510)

Apr 7, 2026
e8f5082
zip
tar.gz

b8576

[SYCL] Enhance build script to use half cores to build, avoid OS hang (…

…ggml-org#21093)

* use half cores to build, avoid OS hang

* reduce the output text num to short test time

* avoid to return 0

Mar 29, 2026
afe65aa
zip
tar.gz

b8508

models : move the token embedding norms to the first layer (ggml-org#…

…20943)

* models : move the token embedding norms to the first layer

* cont : fix LLM_TENSOR_CONV1D + fix il indexing

Mar 24, 2026
9f102a1
zip
tar.gz

b8211

models : kda chunk size = 16 (ggml-org#19827)

* models : add llm_build_delta_net_base

* cont : keep qwen35 and qwen35moe graphs intact

* cont : add comments [no ci]

* add kimi linear to delta-net-base

* removed unnecessary ggml_cont from g_exp_t

* removed ggml_cont from g_diff_exp_t. moved ggml_cont for o to kimi-linear.cpp

* removed unnecessary diag mask

* cont : simplify

* cont : avoid graph splits

* scale q after mul instead of beginning

* scale q after mul instead of beginning

* identical ppl

* cont : fix scale and decay mask

* minor : remove TODO

* block implementation for kda

* remove space at the end of line 101

* concat+pad

* pad+binary row concat

* chunk size 16 for kda

* removed minor differences to master

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Mar 5, 2026
a0ed91a
zip
tar.gz

b8193

docs: Fix intel documentation link (ggml-org#20040)

Mar 3, 2026
ecd99d6
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b8913

b8850

b8785

b8763

b8720

b8693

b8576

b8508

b8211

b8193

Tags: erickwill/llama.cpp-Stock