Skip to content

Tags: jeffmaury/llama.cpp

Tags

b3837

Toggle b3837's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
test-backend-ops : use flops for some performance tests (ggml-org#9657)

* test-backend-ops : use flops for some performance tests

- parallelize tensor quantization

- use a different set of cases for performance and correctness tests

- run each test for at least one second

b3835

Toggle b3835's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vocab : refactor tokenizer to reduce init overhead (ggml-org#9449)

* refactor tokenizer

* llama : make llm_tokenizer more private

ggml-ci

* refactor tokenizer

* refactor tokenizer

* llama : make llm_tokenizer more private

ggml-ci

* remove unused files

* remove unused fileds to avoid unused filed build error

* avoid symbol link error

* Update src/llama.cpp

* Update src/llama.cpp

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b3834

Toggle b3834's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : add support for Chameleon (ggml-org#8543)

* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <[email protected]>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <[email protected]>

b3832

Toggle b3832's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : add run-time detection of neon, i8mm and sve (ggml-org#9331)

* ggml: Added run-time detection of neon, i8mm and sve

Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.

* ggml: Extend feature detection to include non aarch64 Arm arch

* ggml: Move definition of ggml_arm_arch_features to the global data section

b3831

Toggle b3831's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enable use to the rebar feature to upload buffers to the device. (ggm…

…l-org#9251)

b3829

Toggle b3829's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake : add option for common library (ggml-org#9661)

b3828

Toggle b3828's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] add missed dll file in package (ggml-org#9577)

* update oneapi to 2024.2

* use 2024.1

---------

Co-authored-by: arthw <[email protected]>

b3827

Toggle b3827's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtgpu: enable VMM (ggml-org#9597)

Signed-off-by: Xiaodong Ye <[email protected]>

b3825

Toggle b3825's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (ggml-org#9217

)

* ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels

* added fallback mechanism when the offline re-quantized model is not
optimized for the underlying target.

* fix for build errors

* remove prints from the low-level code

* Rebase to the latest upstream

b3824

Toggle b3824's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : add more env vars, improve gen-docs (ggml-org#9635)

* server : add more env vars, improve gen-docs

* update server docs

* LLAMA_ARG_NO_CONTEXT_SHIFT