Skip to content

Tags: FellowTraveler/llama.cpp

Tags

b6502

Toggle b6502's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (ggml…

…-org#16018)

* Add paramater buffer pool, batching of submissions, refactor command building/submission

* Add header for linux builds

* Free staged parameter buffers at once

* Format with clang-format

* Fix thread-safe implementation

* Use device implicit synchronization

* Update workflow to use custom release

* Remove testing branch workflow

* some f32 tests passing

* Disable set_rows until it's implemented

* f32 add all tests passing

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Add templated addition, clean up code

* Get addition and multiplication working

* Implement rms_norm

* Add get_rows implementation

* Add new get_rows files

* Refactor use of wg size entry

* Fix compilation

* Try manually unrolled q4_0 quant

* Revert "Try manually unrolled q4_0 quant"

This reverts commit 77f8b96.

* Move to constant max wg size

* Check for tensor size in supports_op

* Vectorize f32 and change default workgroup size

* Move f32 get_rows from < 4 to % 4 != 0

* fix linter errors

* Add in-place tests

---------

Co-authored-by: Neha Abbas <[email protected]>

b4508

Toggle b4508's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Adding linenoise.cpp to llama-run (ggml-org#11252)

This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <[email protected]>

b4409

Toggle b4409's commit message

Verified

This commit was signed with the committer’s verified signature.
ggerganov Georgi Gerganov
metal : avoid uint (ggml-org#11019)

b3830

Toggle b3830's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
readme : update hot topics

b3816

Toggle b3816's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : add newline after chat example (ggml-org#9616)

b3805

Toggle b3805's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Revert "[SYCL] fallback mmvq (ggml-org#9088)" (ggml-org#9579)

This reverts commit 50addec.

b3751

Toggle b3751's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : add loading html page while model is loading (ggml-org#9468)

* Adding loading page for '/' server requests

* set content when model is loading

* removed loading html file

* updated cmakelist

* updated makefile

* cleaned up whitespace

* cleanup for PR removed error

* updated server test to handle 503 HTML

* updated server test to handle 503 HTML

* ca†ch 503 before parsing json

* revert test

* account for both api and web browser requests

* precommit corrections

* eol fix

* revert changes to pre-commit

* removed print statement

* made loading message more descriptive

* also support .html files

---------

Co-authored-by: VJHack <[email protected]>
Co-authored-by: Vinesh Janarthanan <[email protected]>

b3678

Toggle b3678's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : simplify state machine for slot (ggml-org#9283)

* server : simplify state machine for slot

* add SLOT_STATE_DONE_PROMPT

* pop_deferred_task

* add missing notify_one

* fix passkey test

* metrics : add n_busy_slots_per_decode

* fix test step

* add test

* maybe fix AddressSanitizer?

* fix deque ?

* missing lock

* pop_deferred_task: also notify

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b3620

Toggle b3620's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CPU/CUDA: Gemma 2 FlashAttention support (ggml-org#8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

b3618

Toggle b3618's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
lora : fix llama conversion script with ROPE_FREQS (ggml-org#9117)