Tags: kriation/llama.cpp
Tags
ggml : add NVPL BLAS support (ggml-org#8329) (ggml-org#8425) * ggml : add NVPL BLAS support * ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>` --------- Co-authored-by: ntukanov <[email protected]>
cuda : suppress 'noreturn' warn in no_device_code (ggml-org#8414) * cuda : suppress 'noreturn' warn in no_device_code This commit adds a while(true) loop to the no_device_code function in common.cuh. This is done to suppress the warning: ```console /ggml/src/ggml-cuda/template-instances/../common.cuh:346:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn] 346 | } | ^ ``` The motivation for this is to reduce the number of warnings when compilng with GGML_HIPBLAS=ON. Signed-off-by: Daniel Bevenius <[email protected]> * squash! cuda : suppress 'noreturn' warn in no_device_code Update __trap macro instead of using a while loop to suppress the warning. Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>
CUDA: optimize and refactor MMQ (ggml-org#8416) * CUDA: optimize and refactor MMQ * explicit q8_1 memory layouts, add documentation
tokenize : add --no-parse-special option (ggml-org#8423) This should allow more easily explaining how parse_special affects tokenization.
llama : use F32 precision in Qwen2 attention and no FA (ggml-org#8412)
Name Migration: Build the deprecation-warning 'main' binary every time ( ggml-org#8404) * Modify the deprecation-warning 'main' binary to build every time, instead of only when a legacy binary is present. This is to help users of tutorials and other instruction sets from knowing what to do when the 'main' binary is missing and they are trying to follow instructions. * Adjusting 'server' name-deprecation binary to build all the time, similar to the 'main' legacy name binary.
[SYCL] Use multi_ptr to clean up deprecated warnings (ggml-org#8256)
ggml : move sgemm sources to llamafile subfolder (ggml-org#8394) ggml-ci
PreviousNext