Tags: l29ah/llama.cpp
Tags
ggml : multi-thread ggml_rope() (~3-4 times faster on M1) (ggml-org#781)
ggml, llama : avoid heavy V transpose + improvements (ggml-org#775) ggml : - added ggml_view_3d() - ggml_view_tensor() now inherits the stride too - reimplement ggml_cpy() to account for dst stride - no longer require tensor->data to be memory aligned llama : - compute RoPE on 32-bit tensors (should be more accurate) - store RoPE-ed K in the KV cache - store transposed V in the KV cache (significant speed-up) - avoid unnecessary Q copy
llama : define non-positive top_k; top_k range check (ggml-org#779) * Define non-positive top_k; top_k range check * minor : brackets --------- Co-authored-by: Georgi Gerganov <[email protected]>
make : missing host optimizations in CXXFLAGS (ggml-org#763)
Define non-positive temperature behavior (ggml-org#720)
10+% performance improvement of ggml_vec_dot_q4_0 on AVX2 (ggml-org#654) * Performance improvement of AVX2 code * Fixed problem with MSVC compiler * Reviewer comments: removed double semicolon, deleted empty line 1962
Windows: reactive sigint handler after each Ctrl-C (ggml-org#736)
Added api for getting/setting the kv_cache (ggml-org#685) The api provides access methods for retrieving the current memory buffer for the kv_cache and its token number. It also contains a method for setting the kv_cache from a memory buffer. This makes it possible to load/save history - maybe support --cache-prompt paramater as well? Co-authored-by: Pavol Rusnak <[email protected]>
make : use -march=native -mtune=native on x86 (ggml-org#609)
PreviousNext