Tags: vp2177/llama.cpp
Tags
graph : fix graph reuse reset of params (ggml-org#14760) ggml-ci
parallel : add option for different RNG seeds (ggml-org#14757) ggml-ci
cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (ggml-org#14741) * Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs Gemma3n uses Matrix-Matrix addition as part of their input processing, wrongly triggering CUDA_GRAPH disablement on NVGPUs even when batch-size of 1 is used. * Exclude `project_per_layer_input` by matching node names This ensures that all other graphs which don't exhibit this pattern do not have their behavior changed. * Revert unnecessary formatting changes
graph : avoid huge warm-up graphs for MoE models (ggml-org#14753) * graph : avoid huge warm-up graphs for MoE models ggml-ci * cont : bump max nodes to 8x model tensors
graph : refactor context to not pass gf explicitly (ggml-org#14629) ggml-ci
graph : Pass the graph placeholder message in debug mode (ggml-org#14748 ) Without that condition, this debug log clutters the screen every batch treated in the prompt processing, or every token generated in Kobold.cpp.
PreviousNext