Tags: jeffmaury/llama.cpp
Tags
test-backend-ops : use flops for some performance tests (ggml-org#9657) * test-backend-ops : use flops for some performance tests - parallelize tensor quantization - use a different set of cases for performance and correctness tests - run each test for at least one second
vocab : refactor tokenizer to reduce init overhead (ggml-org#9449) * refactor tokenizer * llama : make llm_tokenizer more private ggml-ci * refactor tokenizer * refactor tokenizer * llama : make llm_tokenizer more private ggml-ci * remove unused files * remove unused fileds to avoid unused filed build error * avoid symbol link error * Update src/llama.cpp * Update src/llama.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>
llama : add support for Chameleon (ggml-org#8543) * convert chameleon hf to gguf * add chameleon tokenizer tests * fix lint * implement chameleon graph * add swin norm param * return qk norm weights and biases to original format * implement swin norm * suppress image token output * rem tabs * add comment to conversion * fix ci * check for k norm separately * adapt to new lora implementation * fix layer input for swin norm * move swin_norm in gguf writer * add comment regarding special token regex in chameleon pre-tokenizer * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * fix punctuation regex in chameleon pre-tokenizer (@compilade) Co-authored-by: compilade <[email protected]> * fix lint * trigger ci --------- Co-authored-by: compilade <[email protected]>
ggml : add run-time detection of neon, i8mm and sve (ggml-org#9331) * ggml: Added run-time detection of neon, i8mm and sve Adds run-time detection of the Arm instructions set features neon, i8mm and sve for Linux and Apple build targets. * ggml: Extend feature detection to include non aarch64 Arm arch * ggml: Move definition of ggml_arm_arch_features to the global data section
[SYCL] add missed dll file in package (ggml-org#9577) * update oneapi to 2024.2 * use 2024.1 --------- Co-authored-by: arthw <[email protected]>
mtgpu: enable VMM (ggml-org#9597) Signed-off-by: Xiaodong Ye <[email protected]>
ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (ggml-org#9217 ) * ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels * added fallback mechanism when the offline re-quantized model is not optimized for the underlying target. * fix for build errors * remove prints from the low-level code * Rebase to the latest upstream
server : add more env vars, improve gen-docs (ggml-org#9635) * server : add more env vars, improve gen-docs * update server docs * LLAMA_ARG_NO_CONTEXT_SHIFT
PreviousNext