training

Training data and coefficient fitting pipeline for the inference-sim crossmodel latency prediction model.

Overview

This repo contains 16 experiments (13 active + 3 excluded overload, 4 models x 4 workload profiles) of vLLM serving traces collected with inference-perf, plus a pipeline that:

Validates trace data integrity (validate_traces.py)
Reconstructs per-step batch composition from journey events (reconstruct_steps.py)
Computes analytical basis functions for the latency model (basis_functions.py)
Fits 10 model coefficients via three-phase stacked prefill/decode NNLS (fit_coefficients.py)
Evaluates accuracy against held-out experiments (evaluate.py — planned)

Design: inference-sim/inference-sim#489 | Fitting spec: inference-sim/training#3

Dataset

Models: Llama-2-7b (TP=1), Llama-2-70b (TP=4), Mixtral-8x7B-v0.1 (TP=2), CodeLlama-34b (TP=2)

Profiles: general, codegen, roleplay, reasoning

Split: Request-level 70/15/15 (train/validate/test) via SHA-256 hash of request ID (see split.py)

Directory layout

split.py                Single source of truth for experiment metadata and request-level splits
schemas.py              Pydantic schemas for all data formats
trace_parser.py         Shared OTEL trace parsing utilities
validate_traces.py      Journey trace validation (5 correctness checks)
reconstruct_steps.py    Step reconstruction from journey events
basis_functions.py      Analytical basis functions for the latency model
fit_coefficients.py     Three-phase NNLS coefficient fitting

tests/                  Behavioral unit tests
  conftest.py             JourneyBuilder fixture for synthetic trace data
  test_reconstruct_steps.py
  test_basis_functions.py
  test_fit_coefficients.py
  test_split.py
  test_trace_parser_api.py

model_configs/          HuggingFace config.json per model
datasheets/             GPU hardware specs (H100 SXM)
default_args/           Raw experiment data
  <experiment>/
    exp-config.yaml       vLLM server parameters
    traces.json           OTEL journey + step traces (gitignored, large)
    results/              inference-perf aggregate metrics

output/                 Generated pipeline outputs (gitignored)
  validate/               Per-experiment validation JSON + summary
  reconstruct/            Per-experiment step + request JSON + summary
  fit/                    Fitted coefficients, lambda tuning, residuals

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Usage

# Validate all active experiments (writes to output/validate/)
python3 validate_traces.py

# Reconstruct step batches and request labels (writes to output/reconstruct/)
python3 reconstruct_steps.py

# Fit 10 latency model parameters (writes to output/fit/)
python3 fit_coefficients.py

# Print experiment summary
python3 split.py

# Run tests
pytest

Key concepts

Teacher-forced reconstruction: We use the real batch compositions from the actual vLLM execution (reconstructed from journey events), not simulated ones. This avoids circular dependencies where predictions alter batch compositions.

Greedy-fill prefill: When a prompt spans multiple scheduler steps (chunked prefill), tokens are distributed using max_num_batched_tokens as the budget cap, mirroring vLLM's scheduler. Decode requests get their 1 token first, remaining budget goes to prefill.

Preemption handling: Requests preempted during decode resume with correct context length (accounting for the gap). Requests preempted during prefill re-enter the PREFILL phase with the correct remaining token count derived from prefill.done_tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

training

Overview

Dataset

Directory layout

Setup

Usage

Key concepts

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
datasheets		datasheets
default_args		default_args
docs/plans		docs/plans
model_configs		model_configs
output		output
replay_data		replay_data
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESIGN.md		DESIGN.md
README.md		README.md
basis_functions.py		basis_functions.py
evaluate.py		evaluate.py
fit_coefficients.py		fit_coefficients.py
pyproject.toml		pyproject.toml
reconstruct_steps.py		reconstruct_steps.py
schemas.py		schemas.py
split.py		split.py
trace_parser.py		trace_parser.py
validate_traces.py		validate_traces.py

Folders and files

Latest commit

History

Repository files navigation

training

Overview

Dataset

Directory layout

Setup

Usage

Key concepts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages