Skip to content

Tags: qades/llama.cpp

Tags

yuan3_0-b8690-a68fa82

Toggle yuan3_0-b8690-a68fa82's commit message
Add missing GEMMA4V projector type support in mtmd/clip.cpp

yuan3_0-b8551-9f40a3b

Toggle yuan3_0-b8551-9f40a3b's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge branch 'master' into yuan3_0

yuan3_0-b8504-5760080

Toggle yuan3_0-b8504-5760080's commit message
Merge branch 'master' into yuan3_0

b8472

Toggle b8472's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server: fix Host header (ggml-org#20843)

It should include port when it's not default.

b8390

Toggle b8390's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] ehance UPSCALE to support all UT cases (ggml-org#20637)

* [SYCL] ehance UPSCALE to support more cases

* rm test case result of SYCL1

b8351

Toggle b8351's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : add FA specialization for HSK = 320, HSV = 256 (ggml-org#20549)

b8350

Toggle b8350's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci : move self-hosted workflows to separate files (ggml-org#20540)

b8348

Toggle b8348's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci : try to optimize some jobs (ggml-org#20521)

* force arm version to test

* run on either x86 or arm if we can help it, this only works for runs without ccache

* readd other jobs

* remove ccache

b8347

Toggle b8347's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
hexagon: Q4_0 and MXFP4 repack fixes (ggml-org#20527)

* hexagon: fix tail corruption with rows sizes not multiple of 256

* hexagon: use different stride for repacking partial blocks

* hex-mm: update repack and kernels to avoid shuffles for full 256-element blocks

Previous commit changed the repacking to use even:odd (0:1,2:3,..) packing
instead of the original (0:128,1:129,...) packing in order to fix tail corruption.
Since the mm kernels already deal with partial tails we can use even:odd
packing only for the last block.
This avoid performance penalty of having to shuffle to zip the elements
in the common case.

* hex-mm: update rmpy x8 for better optimizations

* hex-mm: tighten supported MUL_MAT checks to avoid spurios failures

* hex-mm: use vzero to init accumulators

* hex-mm: properly call partial rmpy_x8

b8340

Toggle b8340's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : add native AVX512-FP16 support for F16 operations (ggml-org#20529

)

The overall benchmark speed remains almost the same because the CPU is
now calculating faster than the RAM can deliver the data. (See perf stat
results below showing 2.7 billion fewer instructions).

Also note that this path will be only enabled for native build or with
custom flags.

now:
```
 Performance counter stats for 'build/bin/llama-bench -m Qwen3-0.6B-f16.gguf -p 512 -n 128':

        189,073.52 msec task-clock                       #   14.658 CPUs utilized
               404      context-switches                 #    2.137 /sec
                19      cpu-migrations                   #    0.100 /sec
           372,390      page-faults                      #    1.970 K/sec
   310,877,195,595      instructions                     #    0.54  insn per cycle
   581,071,530,602      cycles                           #    3.073 GHz
    19,352,107,994      branches                         #  102.352 M/sec
        48,304,438      branch-misses                    #    0.25% of all branches
    84,998,431,152      L1-dcache-loads                  #  449.552 M/sec
    12,186,410,279      L1-dcache-load-misses            #   14.34% of all L1-dcache accesses

      12.899358742 seconds time elapsed

     187.823044000 seconds user
       1.253416000 seconds sys
```

before:
```
 Performance counter stats for 'build/bin/llama-bench -m Qwen3-0.6B-f16.gguf -p 512 -n 128':

        190,594.56 msec task-clock                       #   14.652 CPUs utilized
               436      context-switches                 #    2.288 /sec
                22      cpu-migrations                   #    0.115 /sec
           372,782      page-faults                      #    1.956 K/sec
   313,574,921,966      instructions                     #    0.54  insn per cycle
   586,064,970,425      cycles                           #    3.075 GHz
    19,585,778,563      branches                         #  102.761 M/sec
        48,437,488      branch-misses                    #    0.25% of all branches
    86,219,336,628      L1-dcache-loads                  #  452.370 M/sec
    12,232,085,771      L1-dcache-load-misses            #   14.19% of all L1-dcache accesses

      13.007923164 seconds time elapsed

     189.395316000 seconds user
       1.202612000 seconds sys
```

Signed-off-by: Adrien Gallouët <[email protected]>