Provide ability to dynamically allocate all available CPU threads without affecting prior functionality by sean-bailey · Pull Request #1364 · abetlen/llama-cpp-python

sean-bailey · 2024-04-19T18:09:40Z

I actually have been using llama-cpp-python server in AWS Lambda functions for some time now. While the default behavior of only allocating half the cpu threads by default is wise, especially considering you may have a multi-user operation occurring and don't want to potentially lock things up, when it comes to single-tenant systems like Lambda, that concern is mitigated.

However, no original functionality is changed. If the user runs llama-cpp-python server today, with their current configurations, they should get the same experience as they have previously. However, if the user specifies n_threads or n_threads_batch as -1, then similar to the n_gpu_layers it will default to using all available listed by multiprocessing.cpu_count(). This is especially effective when using AWS Lambda as the CPU count scales with the memory allocation, and if a model needs more memory to perform, it likely will benefit from a higher CPU count as well.

As an example which leverages this effectively:
https://github.com/baileytec-labs/llama-on-lambda/tree/main/llama_lambda/llama-cpp-server-container

This update is designed to add functionality while not interfering with any current functionality.

abetlen · 2024-04-23T01:43:59Z

Thank you @sean-bailey, I'll take a look at adding this to the Llama class as well.

…desired using `-1` (abetlen#1364)

Provide ability to dynamically glomp all threads if desired

375ca01

abetlen merged commit 53ebcc8 into abetlen:main Apr 23, 2024

xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 30, 2024

feat(server): Provide ability to dynamically allocate all threads if …

bac4538

…desired using `-1` (abetlen#1364)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide ability to dynamically allocate all available CPU threads without affecting prior functionality#1364

Provide ability to dynamically allocate all available CPU threads without affecting prior functionality#1364
abetlen merged 1 commit intoabetlen:mainfrom
baileytec-labs:main

sean-bailey commented Apr 19, 2024

Uh oh!

abetlen commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sean-bailey commented Apr 19, 2024

Uh oh!

abetlen commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants