Skip to content

Tags: erickwill/llama.cpp-Stock

Tags

b8913

Toggle b8913's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(shader): handle the buffer aliasing for rms fuse (ggml-org#22266)

b8850

Toggle b8850's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: refactor mma data loading for AMD (ggml-org#22051)

* CUDA: refactor mma data loading for AMD

* fix CDNA MMQ occupancy

* fix CDNA3 mma

* fix RDNA3 compile

b8785

Toggle b8785's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Support GGML_TYPE_NVFP4 (ggml-org#21455)

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For
mul_mat, it does not add support for the dp4/q8_1 path, it's all via
fp16/fp32.

b8763

Toggle b8763's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: skip compilation of superfluous FA kernels (ggml-org#21768)

b8720

Toggle b8720's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: also store `node->src->data` ptrs for equality check (ggml-org#…

…21635)

* CUDA: also store node->src->data ptrs for equality check

* address review comments

b8693

Toggle b8693's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : fix restore for checkpoints with pos_min == 0 (ggml-org#21510)

b8576

Toggle b8576's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Enhance build script to use half cores to build, avoid OS hang (

…ggml-org#21093)

* use half cores to build, avoid OS hang

* reduce the output text num to short test time

* avoid to return 0

b8508

Toggle b8508's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
models : move the token embedding norms to the first layer (ggml-org#…

…20943)

* models : move the token embedding norms to the first layer

* cont : fix LLM_TENSOR_CONV1D + fix il indexing

b8211

Toggle b8211's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
models : kda chunk size = 16 (ggml-org#19827)

* models : add llm_build_delta_net_base

* cont : keep qwen35 and qwen35moe graphs intact

* cont : add comments [no ci]

* add kimi linear to delta-net-base

* removed unnecessary ggml_cont from g_exp_t

* removed ggml_cont from g_diff_exp_t. moved ggml_cont for o to kimi-linear.cpp

* removed unnecessary diag mask

* cont : simplify

* cont : avoid graph splits

* scale q after mul instead of beginning

* scale q after mul instead of beginning

* identical ppl

* cont : fix scale and decay mask

* minor : remove TODO

* block implementation for kda

* remove space at the end of line 101

* concat+pad

* pad+binary row concat

* chunk size 16 for kda

* removed minor differences to master

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b8193

Toggle b8193's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
docs: Fix intel documentation link (ggml-org#20040)