Skip to content

Tags: kriation/llama.cpp

Tags

b3375

Toggle b3375's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : add NVPL BLAS support (ggml-org#8329) (ggml-org#8425)

* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <[email protected]>

b3374

Toggle b3374's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cuda : suppress 'noreturn' warn in no_device_code (ggml-org#8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/ggml/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
  346 | }
      | ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <[email protected]>

* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>

b3373

Toggle b3373's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: optimize and refactor MMQ (ggml-org#8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

b3371

Toggle b3371's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
tokenize : add --no-parse-special option (ggml-org#8423)

This should allow more easily explaining
how parse_special affects tokenization.

b3370

Toggle b3370's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : use F32 precision in Qwen2 attention and no FA (ggml-org#8412)

b3369

Toggle b3369's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Initialize default slot sampling parameters from the global context. (g…

…gml-org#8418)

gguf-v0.9.1

Toggle gguf-v0.9.1's commit message
gguf-py 0.9.1 release

b3368

Toggle b3368's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Name Migration: Build the deprecation-warning 'main' binary every time (

ggml-org#8404)

* Modify the deprecation-warning 'main' binary to build every time, instead of only when a legacy binary is present. This is to help users of tutorials and other instruction sets from knowing what to do when the 'main' binary is missing and they are trying to follow instructions.

* Adjusting 'server' name-deprecation binary to build all the time, similar to the 'main' legacy name binary.

b3367

Toggle b3367's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Use multi_ptr to clean up deprecated warnings (ggml-org#8256)

b3366

Toggle b3366's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : move sgemm sources to llamafile subfolder (ggml-org#8394)

ggml-ci