Tags · nevrax/llama.cpp

b4400

common, examples, ggml : fix MSYS2 GCC compiler errors and warnings w…

…hen building with LLAMA_CURL=ON and GGML_OPENCL=ON (ggml-org#11013)

In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)

In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)

In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)

Dec 31, 2024
6e1531a
zip
tar.gz

b4399

vulkan: optimize mul_mat for small values of N (ggml-org#10991)

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.

Dec 30, 2024
716bd6d
zip
tar.gz

b4398

android : fix llama_batch free (ggml-org#11014)

Dec 30, 2024
c250ecb
zip
tar.gz

b4397

vulkan: im2col and matmul optimizations for stable diffusion (ggml-or…

…g#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup

Dec 29, 2024
a813bad
zip
tar.gz

b4396

vulkan: Use push constant offset to handle misaligned descriptors (gg…

…ml-org#10987)

Dec 29, 2024
fdd2188
zip
tar.gz

b4394

server : fix token duplication when streaming with stop strings (ggml…

…-org#10997)

Dec 28, 2024
16cdce7
zip
tar.gz

b4393

vulkan: multi-row k quants (ggml-org#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Dec 26, 2024
d79d8f3
zip
tar.gz

b4392

examples, ggml : fix GCC compiler warnings (ggml-org#10983)

Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]  (emitted for all struct field except first)

Dec 26, 2024
d283d02
zip
tar.gz

b4391

server : add support for "encoding_format": "base64" to the */embeddi…

…ngs endpoints (ggml-org#10967)

* add support for base64

* fix base64 test

* improve test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Dec 24, 2024
9ba399d
zip
tar.gz

b4390

ggml : more perfo with llamafile tinyblas on x86_64 (ggml-org#10714)

* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
ikawrakow/ik_llama.cpp#71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test

Dec 24, 2024
2cd43f4
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b4400

b4399

b4398

b4397

b4396

b4394

b4393

b4392

b4391

b4390

Tags: nevrax/llama.cpp