Skip to content

Tags: olliewalsh/llama.cpp

Tags

b8701

Toggle b8701's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (ggml-org#21168)

* ds_read_b128 for q4_0 and q4_1 mmq kernels

     Current for loop generates ds_read_b32 instructions with hip compiler, the new solution generates ds_read_b128 instructions for the same operation, saving some LDS bandwidth. Tested on MI50 and RX6800XT, its faster on both.

* Vectorized lds load update: used ggml_cuda_get_max_cpy_bytes and ggml_cuda_memcpy_1 functions for generic implementation

* Explicit for loop in mmq, renamed vec into tmp

* Fixed max_cpy usage in the loading loop

* Fixed typo in q4_1 kernel

* Update ggml/src/ggml-cuda/mmq.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Update ggml/src/ggml-cuda/mmq.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Update ggml/src/ggml-cuda/mmq.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Renoved trailing white line 500

* Update mmq.cuh removed other whitelines

* Remove trailing whitespaces

---------

Co-authored-by: iacopPBK <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: iacopPBK <[email protected]>

b8218

Toggle b8218's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Checkpoint every n tokens: squash (ggml-org#20087)

b8030

Toggle b8030's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: Do not mutate cgraph for fused ADDs (ggml-org#19566)

* Do not mutate cgraph for fused ADDs

1. We should try to minimize in-place changes to the incoming
   ggml_cgraph where possible (those should happen in graph_optimize)
2. Modifying in-place leads to an additional, unnecessary graph capture
   step as we store the properties before modifying the graph in-place
   in the cuda-backend

* Assert ggml_tensor is trivially copyable

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Aman Gupta <[email protected]>

---------

Co-authored-by: Aman Gupta <[email protected]>

b7661

Toggle b7661's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
scripts : add pr2wt.sh (ggml-org#18644)

* scripts : add pr2wt.sh

* script : shebang

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>