Tags · efocht/llama.cpp

b6115

convert : support non-mxfp4 HF model (ggml-org#15153)

* convert : support non-mxfp4 HF model

* rm redundant check

* disable debug check

Aug 7, 2025
50aa938
zip
tar.gz
Downloads

b4644

metal : adjust support conditions for norm operators (ggml-org#11671)

cont ggml-org#11659

ggml-ci

Feb 5, 2025
d774ab3
zip
tar.gz
Downloads

b4393

vulkan: multi-row k quants (ggml-org#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Dec 26, 2024
d79d8f3
zip
tar.gz
Downloads

master-dd0eabc

ggml : use full range for Q4_0 and Q4_2 quantization (ggml-org#729)

* Use full range for q4_0 quantization

By keeping the sign of the highest magnitude, we can make sure the
highest value maps to -8, which is currently unused.
This is a bit of a freebie since it is fully backwards compatible with
the current format.

* Update quantize_row_q4_0 for AVX/AVX2

* Update quantize_row_q4_0 for WASM

Untested

* Update quantize_row_q4_0 for Arm NEON

* Update quantize_row_q4_0 for PowerPC

Untested

* Use full range for q4_2 quantization

Apr 25, 2023
dd0eabc
zip
tar.gz

master-7a32fcb

ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM…

… NEON) (ggml-org#1179)

* ggml : add Q8_0 quantization format (rename the old one to Q8_1)

* tests : fix test-quantize-fns

* ggml : finalize Q8_0 implementation

* ggml : use q4_0_q8_0 and q4_2_q8_0

* ggml : fix Q8_0 dot product bug (ARM)

* ggml : Q8_0 unroll x2

* ggml : fix bug - using wrong block type

* ggml : extend quantize_fns_t with "vec_dot_type"

* ggml : fix Q8_0 to use 255 values out of 256

* ggml : fix assert using wrong QK4_2 instead of QK4_3

Apr 25, 2023
7a32fcb
zip
tar.gz

master-e4cf982

Fix cuda compilation (ggml-org#1128)

* Fix: Issue with CUBLAS compilation error due to missing -fPIC flag

---------

Co-authored-by: B1gM8c <[email protected]>

Apr 24, 2023
e4cf982
zip
tar.gz

master-c4fe84f

llama : refactor get / set state + remove redundant kv cache API (ggm…

…l-org#1143)

Apr 24, 2023
c4fe84f
zip
tar.gz

master-957c8ae

llama : increase scratch buffer size for 65B (ref ggml-org#1152)

Temporary solution

Apr 24, 2023
957c8ae
zip
tar.gz

master-54bb60e

ggml : fix bug in ggml_compute_forward_sum_f32 (ggml-org#1162)

The sum over all rows is now computed instead of just the last row

Apr 24, 2023
54bb60e
zip
tar.gz

master-9b0a4d4

examples/main README improvements and some light refactoring (ggml-or…

…g#1131)

Apr 24, 2023
9b0a4d4
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b6115

b4644

b4393

master-dd0eabc

master-7a32fcb

master-e4cf982

master-c4fe84f

master-957c8ae

master-54bb60e

master-9b0a4d4

Tags: efocht/llama.cpp