Skip to content

Provide ability to dynamically allocate all available CPU threads without affecting prior functionality#1364

Merged
abetlen merged 1 commit intoabetlen:mainfrom
baileytec-labs:main
Apr 23, 2024
Merged

Provide ability to dynamically allocate all available CPU threads without affecting prior functionality#1364
abetlen merged 1 commit intoabetlen:mainfrom
baileytec-labs:main

Conversation

@sean-bailey
Copy link
Copy Markdown
Contributor

I actually have been using llama-cpp-python server in AWS Lambda functions for some time now. While the default behavior of only allocating half the cpu threads by default is wise, especially considering you may have a multi-user operation occurring and don't want to potentially lock things up, when it comes to single-tenant systems like Lambda, that concern is mitigated.

However, no original functionality is changed. If the user runs llama-cpp-python server today, with their current configurations, they should get the same experience as they have previously. However, if the user specifies n_threads or n_threads_batch as -1, then similar to the n_gpu_layers it will default to using all available listed by multiprocessing.cpu_count(). This is especially effective when using AWS Lambda as the CPU count scales with the memory allocation, and if a model needs more memory to perform, it likely will benefit from a higher CPU count as well.

As an example which leverages this effectively:
https://github.com/baileytec-labs/llama-on-lambda/tree/main/llama_lambda/llama-cpp-server-container

This update is designed to add functionality while not interfering with any current functionality.

@abetlen
Copy link
Copy Markdown
Owner

abetlen commented Apr 23, 2024

Thank you @sean-bailey, I'll take a look at adding this to the Llama class as well.

@abetlen abetlen merged commit 53ebcc8 into abetlen:main Apr 23, 2024
xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants