KairoScale is an ML training optimization pipeline that runs:
Profileyour training scriptAnalyzebottlenecks and generate optimization candidatesValidatecandidates against a control runReportspeed/cost/stability results
- Python 3.10+
pip- A training repo with an entry script (default:
train.py)
For cloud GPU runs:
- Modal account + Modal CLI configured (
modal setup)
For LLM-backed analysis providers:
ANTHROPIC_API_KEY(providerclaude) orOPENAI_API_KEY(provideropenai)
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txtrequirements.txt installs this package with all integrations and dev tooling.
Create .env (or export directly in shell):
# For Modal sandbox runs
export MODAL_TOKEN_ID=your_modal_token_id
export MODAL_TOKEN_SECRET=your_modal_token_secret
# Pick one (or both) for LLM analysis
export ANTHROPIC_API_KEY=your_anthropic_key
export OPENAI_API_KEY=your_openai_key
# Optional: pre-existing Modal vLLM endpoint
export MODAL_VLLM_URL=https://<workspace>--KairoScale-vllm-serve.modal.run.env.example includes Modal + OpenAI placeholders.
Load env vars before running:
set -a
source .env
set +apython -m streamlit run KairoScale/ui/app.pyUse this for live demos and hackathon judging flow.
KairoScale run /abs/path/to/your/repo \
--local \
--provider heuristic \
--objective-profile latency \
--entry train.pyGood for quick iteration when cloud GPUs or API keys are unavailable.
KairoScale run /abs/path/to/your/repo \
--provider openai \
--objective-profile balanced \
--gpu a100-80gb \
--entry train.pyNotes:
- Omit
--localto run in Modal sandboxes. --providercan beclaude,openai,modal, orheuristic.- If
--provider modalis used andMODAL_VLLM_URLis not set, KairoScale auto-deploysmodal_app.py.
After a pipeline run, config JSONs are exported to KairoScale_configs/.
Deploy one config:
KairoScale deploy KairoScale_configs/opt-001.json \
--repo /abs/path/to/your/repo \
--gpu a100-80gb \
--entry train.pyRun deploy locally instead of Modal:
KairoScale deploy KairoScale_configs/opt-001.json \
--repo /abs/path/to/your/repo \
--local \
--entry train.pyOptional deploy flags:
--steps 1000(setsTRAIN_STEPSin wrapper env)--timeout 3600--python-bin /path/to/python(local mode)
KairoScale_report.md(final report)KairoScale_configs/*.json(candidate and combo configs)- Local/Modal artifact directories with profiling and validation outputs
KairoScale --help
KairoScale run --help
KairoScale deploy --help
pytest -qCore tools/services used in this project:
- Python + Click (CLI)
- Streamlit (UI command center)
- Modal (GPU sandbox execution + optional hosted vLLM endpoint)
- PyTorch profiler + runtime instrumentation (profiling)
- Anthropic/OpenAI-compatible providers for optimization generation
- PyYAML (config loading)
- Pytest (test suite)