Tags · kriation/llama.cpp

b3375

ggml : add NVPL BLAS support (ggml-org#8329) (ggml-org#8425)

* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <[email protected]>

Jul 11, 2024
3686456
zip
tar.gz

b3374

cuda : suppress 'noreturn' warn in no_device_code (ggml-org#8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/ggml/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
  346 | }
      | ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <[email protected]>

* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>

Jul 11, 2024
b078c61
zip
tar.gz

b3373

CUDA: optimize and refactor MMQ (ggml-org#8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

Jul 11, 2024
808aba3
zip
tar.gz

b3371

tokenize : add --no-parse-special option (ggml-org#8423)

This should allow more easily explaining
how parse_special affects tokenization.

Jul 11, 2024
9a55ffe
zip
tar.gz

b3370

llama : use F32 precision in Qwen2 attention and no FA (ggml-org#8412)

Jul 11, 2024
7a221b6
zip
tar.gz

b3369

Initialize default slot sampling parameters from the global context. (g…

…gml-org#8418)

Jul 11, 2024
278d0e1
zip
tar.gz

gguf-v0.9.1

gguf-py 0.9.1 release

Jul 10, 2024
ff137fb
zip
tar.gz

b3368

Name Migration: Build the deprecation-warning 'main' binary every time (

ggml-org#8404)

* Modify the deprecation-warning 'main' binary to build every time, instead of only when a legacy binary is present. This is to help users of tutorials and other instruction sets from knowing what to do when the 'main' binary is missing and they are trying to follow instructions.

* Adjusting 'server' name-deprecation binary to build all the time, similar to the 'main' legacy name binary.

Jul 10, 2024
dd07a12
zip
tar.gz

b3367

[SYCL] Use multi_ptr to clean up deprecated warnings (ggml-org#8256)

Jul 10, 2024
f4444d9
zip
tar.gz

b3366

ggml : move sgemm sources to llamafile subfolder (ggml-org#8394)

ggml-ci

Jul 10, 2024
6b2a849
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3375

b3374

b3373

b3371

b3370

b3369

gguf-v0.9.1

b3368

b3367

b3366

Tags: kriation/llama.cpp