Tags · PMZFX/llama.cpp-sycl

b8679

llama-bench: add `-fitc` and `-fitt` to arguments (ggml-org#21304)

* llama-bench: add `-fitc` and `-fitt` to arguments

* update README.md

* address review comments

* update compare-llama-bench.py

Apr 6, 2026
94ca829
zip
tar.gz

b8678

vocab : add byte token handling to BPE detokenizer for Gemma4 (ggml-o…

…rg#21488)

Apr 6, 2026
4aa962e
zip
tar.gz

b8676

server : handle unsuccessful sink.write in chunked stream provider (g…

…gml-org#21478)

Check the return value of sink.write() in the chunked content provider
and return false when the write fails, matching cpp-httplib's own
streaming contract. This prevents logging chunks as sent when the sink
rejected them and properly aborts the stream on connection failure.

Apr 6, 2026
482d862
zip
tar.gz

b8672

hexagon: slight optimization for argosrt output init (ggml-org#21463)

Apr 6, 2026
25eec6f
zip
tar.gz

b8671

llama : correct platform-independent loading of BOOL metadata (ggml-o…

…rg#21428)

* model-loader : fix GGUF bool array conversion

* model-loader : fix remaining GGUF bool pointer uses

Apr 5, 2026
58190cc
zip
tar.gz

b8670

model : add HunyuanOCR support (ggml-org#21395)

* HunyuanOCR: add support for text and vision models

- Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge
- Add separate HUNYUAN_OCR chat template (content-before-role format)
- Handle HunyuanOCR's invalid pad_token_id=-1 in converter
- Fix EOS/EOT token IDs from generation_config.json
- Support xdrope RoPE scaling type
- Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.)
- Register HunYuanVLForConditionalGeneration for both text and mmproj conversion

* fix proper mapping

* Update gguf-py/gguf/tensor_mapping.py

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* address comments

* update

* Fix typecheck

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Apr 5, 2026
af76639
zip
tar.gz

b8668

server : fix logging of build + system info (ggml-org#21460)

This PR changes the logging that occurs at startup of llama-server.
Currently, it is redundant (including CPU information twice) and it is
missing the build + commit info.

Apr 5, 2026
5d3a4a7
zip
tar.gz

b8667

ci: lower cuda12 floor to 12.8.1 for broader host compatibility (ggml…

…-org#21438)

Co-authored-by: M1DNYT3 <[email protected]>

Apr 5, 2026
c08d28d
zip
tar.gz

b8665

common : add gemma 4 specialized parser (ggml-org#21418)

* common : add gemma4 dedicated parser

* cont : add '<|tool_response>' as eog

* cont : emit JSON from Gemma4 tool call AST

* cont : more fixes

* cont : refactor convert function

* cont : refine rules and mapping

* cont : add more tests

* cont : clean up

* cont : remove autoparser gemma4 implementation

* cont : more cleanup

* cont : rename gemma4.jinja to match the others

* cont : add custom template to support interleaved thinking

* cont : preserve reasoning in model turns

* cont : fix initializer error

* cont : fix unused vars

* cont : fix accidental static

* cont : fix specialized_template signature

* fix extra semicolon

* remove debug line and extra space [no ci]

Apr 4, 2026
b863507
zip
tar.gz

b8664

server: Fix undefined timing measurement errors in server context (gg…

…ml-org#21201)

Co-authored-by: Dan Hoffman <[email protected]>

Apr 4, 2026
9c69907
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b8679

b8678

b8676

b8672

b8671

b8670

b8668

b8667

b8665

b8664

Tags: PMZFX/llama.cpp-sycl