Skip to content

AcidicSoil/llm-workspace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Workspace For Experimenting

This workspace now includes a tested planner that picks an inference engine from:

  • the local hardware profile
  • the rules captured in cheatsheet.md
  • the selected model shape and workload

Setup

uv sync --dev

That creates .venv and installs the CLI entrypoint:

uv run llm-inference-plan --help

Example: This Machine, Interactive 14B EXL2

uv run llm-inference-plan \
  --model-name Qwen3-14B-Instruct-EXL2 \
  --family qwen \
  --params-b 14 \
  --quantization exl2 \
  --context-len 8192 \
  --workload interactive

On the current RTX 5070 Ti 16 GiB + 7950X3D box, that resolves to exllamav2.

Example: Same Machine, API Serving

uv run llm-inference-plan \
  --model-name Llama-3.1-8B-Instruct-AWQ \
  --family llama \
  --params-b 8 \
  --quantization awq \
  --context-len 8192 \
  --concurrency 12 \
  --workload api

That resolves to vllm with prefix caching enabled.

JSON Output

uv run llm-inference-plan \
  --format json \
  --model-name Qwen3-14B-Instruct-EXL2 \
  --family qwen \
  --params-b 14 \
  --quantization exl2 \
  --context-len 8192 \
  --workload interactive

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages