Tags: justinsb/llama.cpp
Tags
CUDA: use async data loading for FlashAttention (ggml-org#11894) * CUDA: use async data loading for FlashAttention --------- Co-authored-by: Diego Devesa <[email protected]>
server : fix divide-by-zero in metrics reporting (ggml-org#11915)
vulkan: implement several ops relevant for ggml_opt (ggml-org#11769) * vulkan: support memset_tensor * vulkan: support GGML_OP_SUM * vulkan: implement GGML_OP_ARGMAX * vulkan: implement GGML_OP_SUB * vulkan: implement GGML_OP_COUNT_EQUAL * vulkan: implement GGML_OP_OPT_STEP_ADAMW * vulkan: fix check_results RWKV_WKV6 crash and memory leaks * vulkan: implement GGML_OP_REPEAT_BACK * tests: remove invalid test-backend-ops REPEAT_BACK tests * vulkan: fix COUNT_EQUAL memset using a fillBuffer command
common : Fix a typo in help (ggml-org#11899) This patch fixes a typo in command help. prefx -> prefix Signed-off-by: Masanari Iida <[email protected]>
metal : fix the crash caused by the lack of residency set support on … …Intel Macs. (ggml-org#11904)
repo : update links to new url (ggml-org#11886) * repo : update links to new url ggml-ci * cont : more urls ggml-ci
PreviousNext