Skip to content

Tags: tmc/llama.cpp

Tags

b1627

Toggle b1627's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
english : use `typos` to fix comments and logs (ggml-org#4354)

b1626

Toggle b1626's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
build : target Windows 8 for standard mingw-w64 (ggml-org#4405)

* build : target Windows 8 for standard mingw-w64

* make : fix missing console.o deps

This was causing a link error with `make all` on Windows.

b1625

Toggle b1625's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : document logits_all deprecation (ggml-org#4418)

llama_context_params.logits_all is a parameter for controlling
llama_eval. This documents that logits_all should not be used with
llama_decode and llama_batch.

b1624

Toggle b1624's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
server : fix local model name in server (ggml-org#4420)

b1623

Toggle b1623's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : increased GGML_MAX_PARAMS to allow finetuning of 70b models (g…

…gml-org#4424)

b1621

Toggle b1621's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
grammar : revert the replacement of llama_token_to_piece with id_to_t…

…oken (ggml-org#4396)

b1620

Toggle b1620's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
sync : ggml (new ops, tests, backend, etc.) (ggml-org#4359)

* sync : ggml (part 1)

* sync : ggml (part 2, CUDA)

* sync : ggml (part 3, Metal)

* ggml : build fixes

ggml-ci

* cuda : restore lost changes

* cuda : restore lost changes (StableLM rope)

* cmake : enable separable compilation for CUDA

ggml-ci

* ggml-cuda : remove device side dequantize

* Revert "cmake : enable separable compilation for CUDA"

This reverts commit 09e35d0.

* cuda : remove assert for rope

* tests : add test-backend-ops

* ggml : fix bug in ggml_concat

* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`

* ci : try to fix macOS

* ggml-backend : remove backend self-registration

* ci : disable Metal for macOS cmake build

ggml-ci

* metal : fix "supports family" call

* metal : fix assert

* metal : print resource path

ggml-ci

---------

Co-authored-by: slaren <[email protected]>

b1619

Toggle b1619's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : per-layer KV cache + quantum K cache (ggml-org#4309)

* per-layer KV

* remove unnecessary copies

* less code duplication, offload k and v separately

* llama : offload KV cache per-layer

* llama : offload K shift tensors

* llama : offload for rest of the model arches

* llama : enable offload debug temporarily

* llama : keep the KV related layers on the device

* llama : remove mirrors, perform Device -> Host when partial offload

* common : add command-line arg to disable KV cache offloading

* llama : update session save/load

* llama : support quantum K cache (ggml-org#4312)

* llama : support quantum K cache (wip)

* metal : add F32 -> Q8_0 copy kernel

* cuda : add F32 -> Q8_0 copy kernel

ggml-ci

* cuda : use mmv kernel for quantum cache ops

* llama : pass KV cache type through API

* llama : fix build

ggml-ci

* metal : add F32 -> Q4_0 copy kernel

* metal : add F32 -> Q4_1 copy kernel

* cuda : wip

* cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

* llama-bench : support type_k/type_v

* metal : use mm kernel only for quantum KV cache

* cuda : add comment

* llama : remove memory_f16 and kv_f16 flags

---------

Co-authored-by: slaren <[email protected]>

* readme : add API change notice

---------

Co-authored-by: slaren <[email protected]>

b1618

Toggle b1618's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
train : fix ggml-org#4227 (double free in examples/train-text-from-sc…

…ratch/train-text-from-scratch.cpp) (ggml-org#4351)

On commit b1108 (44c117f) xaedes added

    ggml_allocr * alloc = NULL;

    ... (many lines in between)

    if (alloc) {
        ggml_allocr_free(alloc);
    }

Which is correct, but it's easy to lose context after many lines in between.

On commit b1287 (0e76a89) xaedes made a big change. From here on, alloc is freed eagerly.

    alloc = ggml_allocr_new(...)
    ... (short lines of code)
    ggml_allocr_free(alloc)

This happens a few times, but alloc is never set to NULL, and many lines below,
we still have

    if (alloc) {
        ggml_allocr_free(alloc);
    }

which causes a double-free.

b1617

Toggle b1617's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
server : recognize cache_prompt parameter in OAI API (ggml-org#4347)