Skip to content

Tags: nevrax/llama.cpp

Tags

b4400

Toggle b4400's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings w…

…hen building with LLAMA_CURL=ON and GGML_OPENCL=ON (ggml-org#11013)

In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)

In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)

In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)

b4399

Toggle b4399's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: optimize mul_mat for small values of N (ggml-org#10991)

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.

b4398

Toggle b4398's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
android : fix llama_batch free (ggml-org#11014)

b4397

Toggle b4397's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: im2col and matmul optimizations for stable diffusion (ggml-or…

…g#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup

b4396

Toggle b4396's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Use push constant offset to handle misaligned descriptors (gg…

…ml-org#10987)

b4394

Toggle b4394's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : fix token duplication when streaming with stop strings (ggml…

…-org#10997)

b4393

Toggle b4393's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: multi-row k quants (ggml-org#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

b4392

Toggle b4392's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
examples, ggml : fix GCC compiler warnings (ggml-org#10983)

Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]  (emitted for all struct field except first)

b4391

Toggle b4391's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : add support for "encoding_format": "base64" to the */embeddi…

…ngs endpoints (ggml-org#10967)

* add support for base64

* fix base64 test

* improve test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b4390

Toggle b4390's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : more perfo with llamafile tinyblas on x86_64 (ggml-org#10714)

* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
ikawrakow/ik_llama.cpp#71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test