Skip to content

Tags: efocht/llama.cpp

Tags

b6115

Toggle b6115's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
convert : support non-mxfp4 HF model (ggml-org#15153)

* convert : support non-mxfp4 HF model

* rm redundant check

* disable debug check

b4644

Toggle b4644's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : adjust support conditions for norm operators (ggml-org#11671)

cont ggml-org#11659

ggml-ci

b4393

Toggle b4393's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: multi-row k quants (ggml-org#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

master-dd0eabc

Toggle master-dd0eabc's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : use full range for Q4_0 and Q4_2 quantization (ggml-org#729)

* Use full range for q4_0 quantization

By keeping the sign of the highest magnitude, we can make sure the
highest value maps to -8, which is currently unused.
This is a bit of a freebie since it is fully backwards compatible with
the current format.

* Update quantize_row_q4_0 for AVX/AVX2

* Update quantize_row_q4_0 for WASM

Untested

* Update quantize_row_q4_0 for Arm NEON

* Update quantize_row_q4_0 for PowerPC

Untested

* Use full range for q4_2 quantization

master-7a32fcb

Toggle master-7a32fcb's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM…

… NEON) (ggml-org#1179)

* ggml : add Q8_0 quantization format (rename the old one to Q8_1)

* tests : fix test-quantize-fns

* ggml : finalize Q8_0 implementation

* ggml : use q4_0_q8_0 and q4_2_q8_0

* ggml : fix Q8_0 dot product bug (ARM)

* ggml : Q8_0 unroll x2

* ggml : fix bug - using wrong block type

* ggml : extend quantize_fns_t with "vec_dot_type"

* ggml : fix Q8_0 to use 255 values out of 256

* ggml : fix assert using wrong QK4_2 instead of QK4_3

master-e4cf982

Toggle master-e4cf982's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Fix cuda compilation (ggml-org#1128)

* Fix: Issue with CUBLAS compilation error due to missing -fPIC flag

---------

Co-authored-by: B1gM8c <[email protected]>

master-c4fe84f

Toggle master-c4fe84f's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : refactor get / set state + remove redundant kv cache API (ggm…

…l-org#1143)

master-957c8ae

Toggle master-957c8ae's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
llama : increase scratch buffer size for 65B (ref ggml-org#1152)

Temporary solution

master-54bb60e

Toggle master-54bb60e's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : fix bug in ggml_compute_forward_sum_f32 (ggml-org#1162)

The sum over all rows is now computed instead of just the last row

master-9b0a4d4

Toggle master-9b0a4d4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
examples/main README improvements and some light refactoring (ggml-or…

…g#1131)