Tags: erickwill/llama.cpp-Stock
Tags
fix(shader): handle the buffer aliasing for rms fuse (ggml-org#22266)
CUDA: refactor mma data loading for AMD (ggml-org#22051) * CUDA: refactor mma data loading for AMD * fix CDNA MMQ occupancy * fix CDNA3 mma * fix RDNA3 compile
vulkan: Support GGML_TYPE_NVFP4 (ggml-org#21455) This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.
CUDA: skip compilation of superfluous FA kernels (ggml-org#21768)
server : fix restore for checkpoints with pos_min == 0 (ggml-org#21510)
models : kda chunk size = 16 (ggml-org#19827) * models : add llm_build_delta_net_base * cont : keep qwen35 and qwen35moe graphs intact * cont : add comments [no ci] * add kimi linear to delta-net-base * removed unnecessary ggml_cont from g_exp_t * removed ggml_cont from g_diff_exp_t. moved ggml_cont for o to kimi-linear.cpp * removed unnecessary diag mask * cont : simplify * cont : avoid graph splits * scale q after mul instead of beginning * scale q after mul instead of beginning * identical ppl * cont : fix scale and decay mask * minor : remove TODO * block implementation for kda * remove space at the end of line 101 * concat+pad * pad+binary row concat * chunk size 16 for kda * removed minor differences to master --------- Co-authored-by: Georgi Gerganov <[email protected]>
PreviousNext