Tags · bdx0/llama.cpp

b1617

server : recognize cache_prompt parameter in OAI API (ggml-org#4347)

Dec 6, 2023
05cd6e5
zip
tar.gz
Downloads

b1428

batched-bench : print params at start

Oct 25, 2023
6961c4b
zip
tar.gz

b1427

log : disable pid in log filenames

Oct 25, 2023
cc44877
zip
tar.gz

b1426

server : add parameter -tb N, --threads-batch N (ggml-org#3584) (ggml…

…-org#3768)

Co-authored-by: Michael Coppola <[email protected]>
Co-authored-by: Michael Coppola <[email protected]>

Oct 24, 2023
ad93962
zip
tar.gz

b1425

server : do not block system prompt update (ggml-org#3767)

* server : do not block system prompt update

* server : update state machine logic to process system prompts

* server : minor

Oct 24, 2023
1717521
zip
tar.gz

b1424

sync : ggml (conv ops + cuda MSVC fixes) (ggml-org#3765)

ggml-ci

Oct 24, 2023
b2f7e04
zip
tar.gz

b1423

cmake : add missed dependencies (ggml-org#3763)

Oct 24, 2023
abd21fc
zip
tar.gz

b1422

cuda : add batched cuBLAS GEMM for faster attention (ggml-org#3749)

* cmake : add helper for faster CUDA builds

* batched : add NGL arg

* ggml : skip nops in compute_forward

* cuda : minor indentation

* cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)

* Apply suggestions from code review

These changes plus:

```c++
#define cublasGemmBatchedEx hipblasGemmBatchedEx
```

are needed to compile with ROCM. I haven't done performance testing, but it seems to work.

I couldn't figure out how to propose a change for lines outside what the pull changed, also this is the first time trying to create a multi-part review so please forgive me if I mess something up.

* cuda : add ROCm / hipBLAS cublasGemmBatchedEx define

* cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases

* cuda : reduce mallocs in cublasGemmBatchedEx branch

* cuda : add TODO for calling cublas from kernel + using mem pool

---------

Co-authored-by: Kerfuffle <[email protected]>

Oct 24, 2023
2b4ea35
zip
tar.gz

b1421

Add more tokenizer tests (ggml-org#3742)

* Add more tokenizer tests

* Add starcoder

* Update test vocab files

* Restrict bpe tokenizer tests to unicode planes

* Update comment

* Comment cosmetics

* Remove bloom vocab/test

Oct 24, 2023
daab3d7
zip
tar.gz

b1420

metal : handle ggml_scale for n%4 != 0 (close ggml-org#3754)

ggml-ci

Oct 24, 2023
469c9ad
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b1617

b1428

b1427

b1426

b1425

b1424

b1423

b1422

b1421

b1420

Tags: bdx0/llama.cpp