Tags: ochafik/llama.cpp
Tags
support permuted, remove check s0/s10 (ggml-org#19889) Co-authored-by: Neo Zhang Jianyu <[email protected]>
docker : add CUDA 13.1 image build (ggml-org#18441) * add updated cuda-new.Dockerfile for Ubuntu 24.04 compatibilty * add cuda13 build
ggml-cuda: fix regex for arch list (ggml-org#18371) * ggml-cuda: fix regex for arch list * make regex exact
server : support unified cache across slots (ggml-org#16736) * server : support unified context across slots * cont : fix speculative decoding initialization * context : fix n_ctx_per_seq computation * server : purge slots one by one * tests : add unified cache server tests * llama : update per-seq context computation * test-thread-safety : handle tiny training context of the input model * server : fix server_tokens clear() * server : use 4 slots + unified KV by default * llama : add note about context size queries * cont : update todos [no ci] * context : do not cap the size of the context * tests : adjust parameters to be CI friendlier * context : add warning
ggml webgpu: profiling, CI updates, reworking of command submission (g… …gml-org#16452) * Add profiling * More detailed profiling * Rework command submission to avoid global locks * Update wait handling * try new method of waiting on futures * Add serializing of command submission in some cases * Add new pool for timestamp queries and clean up logging * Serialize command submission in CI and leave a TODO note * Update webgpu CI * Add myself as WebGPU codeowner * Deadlock avoidance * Leave WebGPU/Vulkan CI serialized * Fix divide by 0 * Fix logic in division by inflight_threads * Update CODEOWNERS and remove serialize submit option
PreviousNext