Tags: Dibakar/llama.cpp
Tags
Update llama-cli documentation (ggml-org#8315) * Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231) * cuda : fix bounds check for src0 rows in MMVQ kernel * Update ggml-cuda/mmvq.cu Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>
common : refactor cli arg parsing (ggml-org#7675) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params
llama : offload to RPC in addition to other backends (ggml-org#7640) * llama : offload to RPC in addition to other backends * - fix copy_tensor being called on the src buffer instead of the dst buffer - always initialize views in the view_src buffer - add RPC backend to Makefile build - add endpoint to all RPC object names * add rpc-server to Makefile * Update llama.cpp Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
[SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 fl… …ag activated (ggml-org#6767) * Fix FP32/FP16 build instructions * Fix typo * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <[email protected]> * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <[email protected]> * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <[email protected]> * Add comments in Intel GPU linux --------- Co-authored-by: Anas Ahouzi <[email protected]> Co-authored-by: Neo Zhang Jianyu <[email protected]>
`build`: generate hex dump of server assets during build (ggml-org#6661) * `build`: generate hex dumps of server assets on the fly * build: workaround lack of -n on gnu xxd * build: don't use xxd in cmake * build: don't call xxd from build.zig * build: more idiomatic hexing * build: don't use xxd in Makefile (od hackery instead) * build: avoid exceeding max cmd line limit in makefile hex dump * build: hex dump assets at cmake build time (not config time)
PreviousNext