Tags · struct/llama.cpp

b8453

vulkan: change gated_delta_net to shard a column across a subgroup (g…

…gml-org#20662)

* vulkan: change gated_delta_net to shard a column across a subgroup

This is based on ggml-org#20391, I used an
LLM to port the CUDA code to Vulkan, and guided to it to make various fixes to
work with Vulkan (e.g. handling different subgroup sizes, unknown mapping of
subgroup to invocation id, using subgroupAdd optionally, etc.).

This fixes a perf regression from the transposing of the values in memory
(!20443).

* vulkan: Spread columns across fewer lanes to reduce the number of workgroups

Mar 20, 2026
e06c3ab
zip
tar.gz

b8344

add op gated_delta_net (ggml-org#20455)

Mar 14, 2026
a93c0ef
zip
tar.gz

b8182

vendors : update miniaudio library to 0.11.24 (ggml-org#19914)

Feb 28, 2026
05728db
zip
tar.gz

b8121

Improve CUDA graph capture (ggml-org#19754)

* Improve CUDA graph capture

Currently, CUDA graphs are eagerly enabled on the first call to ggml_backend_cuda_graph_compute. If the graph properties keep changing (4+ consecutive updates), the graph is permanently disabled. This is suboptimal because:

- The first call always incurs CUDA graph capture overhead even if the graph is unstable
- Once permanently disabled, CUDA graphs never re-enable even after the graph stabilizes (e.g., switching from prompt processing to decode)

The new approach delays CUDA graph activation until warmup completes: the same cgraph must be called at least twice with matching properties before CUDA graph capture begins. This avoids wasted capture overhead on volatile graphs and allows graphs to become eligible once they stabilize.
This also fixes issues such as ggml-org#19708

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <[email protected]>

* Remove EM dashes

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Aman Gupta <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: Aman Gupta <[email protected]>

Feb 21, 2026
a0c91e8
zip
tar.gz

b8075

common : inline functions (ggml-org#18639)

Feb 16, 2026
cceb1b4
zip
tar.gz

b8055

convert : ensure all models handle new experts count (ggml-org#19621)

* ensure all models handle new experts count

* revert removal for PhiMoeModel, does not inherit from base

Feb 14, 2026
079feab
zip
tar.gz

b7898

ggml-hexagon: flash-attention and reduce-sum optimizations (ggml-org#…

…19141)

* wip

* ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation

* ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations

* wip

* ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance

* ggml-hexagon: refactor dot product functions to use a common loading function for improved readability

* optimize vector dot product functions to use unified reduction for improved performance

* wip

* ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation

* ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations

* wip

* ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance

* ggml-hexagon: refactor dot product functions to use a common loading function for improved readability

* optimize vector dot product functions to use unified reduction for improved performance

* hexagon: optimize reduce-sum for v75+

* hexagon: always keep row_sums in sf/fp32

* ggml-hexagon: enhance directory checks for HEXAGON_SDK_ROOT and HEXAGON_TOOLS_ROOT

* fix compiling error after rebase

---------

Co-authored-by: Max Krasnyansky <[email protected]>

Jan 31, 2026
89f10ba
zip
tar.gz

b7695

scripts : follow api redirects in pr2wt.sh (ggml-org#18739)

Jan 10, 2026
7fdc8c8
zip
tar.gz

b7677

vulkan: fix push constant size for quantize_q8_1 (ggml-org#18687)

I added an assert to catch further mismatches, and it found several.
Fix those, too.

Jan 8, 2026
2524c26
zip
tar.gz

b7597

sync : ggml

Dec 31, 2025
13814eb
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b8453

b8344

b8182

b8121

b8075

b8055

b7898

b7695

b7677

b7597

Tags: struct/llama.cpp