Tags · FellowTraveler/llama.cpp

b6502

GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (ggml…

…-org#16018)

* Add paramater buffer pool, batching of submissions, refactor command building/submission

* Add header for linux builds

* Free staged parameter buffers at once

* Format with clang-format

* Fix thread-safe implementation

* Use device implicit synchronization

* Update workflow to use custom release

* Remove testing branch workflow

* some f32 tests passing

* Disable set_rows until it's implemented

* f32 add all tests passing

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Add templated addition, clean up code

* Get addition and multiplication working

* Implement rms_norm

* Add get_rows implementation

* Add new get_rows files

* Refactor use of wg size entry

* Fix compilation

* Try manually unrolled q4_0 quant

* Revert "Try manually unrolled q4_0 quant"

This reverts commit 77f8b96.

* Move to constant max wg size

* Check for tensor size in supports_op

* Vectorize f32 and change default workgroup size

* Move f32 get_rows from < 4 to % 4 != 0

* fix linter errors

* Add in-place tests

---------

Co-authored-by: Neha Abbas <[email protected]>

Sep 17, 2025
d304f45
zip
tar.gz
Downloads

b4508

Adding linenoise.cpp to llama-run (ggml-org#11252)

This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <[email protected]>

Jan 18, 2025
a1649cc
zip
tar.gz
Downloads

b4409

metal : avoid uint (ggml-org#11019)

Jan 3, 2025
e7da954
zip
tar.gz
Downloads

b3830

readme : update hot topics

Sep 27, 2024
b5de3b7
zip
tar.gz
Downloads

b3816

server : add newline after chat example (ggml-org#9616)

Sep 24, 2024
0aa1501
zip
tar.gz
Downloads

b3805

Revert "[SYCL] fallback mmvq (ggml-org#9088)" (ggml-org#9579)

This reverts commit 50addec.

Sep 23, 2024
e62e978
zip
tar.gz
Downloads

b3751

server : add loading html page while model is loading (ggml-org#9468)

* Adding loading page for '/' server requests

* set content when model is loading

* removed loading html file

* updated cmakelist

* updated makefile

* cleaned up whitespace

* cleanup for PR removed error

* updated server test to handle 503 HTML

* updated server test to handle 503 HTML

* ca†ch 503 before parsing json

* revert test

* account for both api and web browser requests

* precommit corrections

* eol fix

* revert changes to pre-commit

* removed print statement

* made loading message more descriptive

* also support .html files

---------

Co-authored-by: VJHack <[email protected]>
Co-authored-by: Vinesh Janarthanan <[email protected]>

Sep 13, 2024
feff4aa
zip
tar.gz
Downloads

b3678

server : simplify state machine for slot (ggml-org#9283)

* server : simplify state machine for slot

* add SLOT_STATE_DONE_PROMPT

* pop_deferred_task

* add missing notify_one

* fix passkey test

* metrics : add n_busy_slots_per_decode

* fix test step

* add test

* maybe fix AddressSanitizer?

* fix deque ?

* missing lock

* pop_deferred_task: also notify

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Sep 6, 2024
9b2c24c
zip
tar.gz
Downloads

b3620

CPU/CUDA: Gemma 2 FlashAttention support (ggml-org#8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

Aug 24, 2024
e11bd85
zip
tar.gz
Downloads

b3618

lora : fix llama conversion script with ROPE_FREQS (ggml-org#9117)

Aug 23, 2024
3ba780e
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b6502

b4508

b4409

b3830

b3816

b3805

b3751

b3678

b3620

b3618

Tags: FellowTraveler/llama.cpp