fix(sim): guard decode-only batch path for non-PD transparency by namasl · Pull Request #628 · inference-sim/inference-sim

namasl · 2026-03-12T14:58:40Z

Summary

Adds ProgressIndex > 0 guard to the PD decode-only batch formation path in VLLMBatchFormation.FormBatch(), ensuring it only fires for PD decode sub-requests (where AllocateTransferredKV explicitly set ProgressIndex = inputLen > 0), not for pathological zero-input requests in non-PD mode
Ensures the pd branch is a fully transparent drop-in replacement for existing BLIS users who don't use PD disaggregation

Context

The PD decode-only path used the condition ProgressIndex >= inputLen to detect decode sub-requests with pre-allocated KV. For hypothetical zero-input requests (len(InputTokens)==0), this condition is satisfied since ProgressIndex(0) >= inputLen(0), causing non-PD requests to incorrectly take the decode-only path instead of the normal prefill path.

Full backward compatibility audit

A comprehensive audit of all 37 changed files (~8700 lines) on the pd branch confirmed this was the only behavioral difference affecting non-PD users. All other PD additions are properly gated:

Area	Guard mechanism
CLI flags	All 10 new PD flags default to disabled (`--pd-decider=never`, instances=0)
Event pipeline	Bifurcation gated by `cs.poolsConfigured()` (nil when PD disabled)
Metrics/JSON output	`PDMetrics` is nil; `printPDMetrics(nil)` is no-op
Trace records	PD records never recorded; summary PD fields are 0
Anomaly counters	Extra `DroppedKVAllocations` line gated on `> 0`
Core simulator	`EnqueueDecodeSubRequest` only called by PD cluster logic
defaults.yaml	No pd-specific changes (all diffs came from main via merge)

Test plan

go test ./... passes (all 11 packages)
go vet ./... clean
Verify existing non-PD simulation produces identical output on both main and pd branches

🤖 Generated with Claude Code

The PD decode-only path in VLLMBatchFormation.FormBatch() used the condition `ProgressIndex >= inputLen` to detect decode sub-requests with pre-allocated KV. For zero-input requests (len(InputTokens)==0), this condition is satisfied since ProgressIndex(0) >= inputLen(0), causing non-PD requests to incorrectly take the decode-only path. Add `ProgressIndex > 0` guard so the path only fires for PD decode sub-requests where AllocateTransferredKV explicitly set ProgressIndex to inputLen (which is > 0 for real requests). This ensures the pd branch is a fully transparent drop-in replacement for non-PD users. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

namasl · 2026-03-12T15:04:31Z

Verification: non-PD output identity between `main` and `pd`

Built binaries from both branches (pd includes this fix) and ran 6 scenarios with identical arguments, comparing stdout and JSON output.

Method

# Build both branches via worktrees
git worktree add /tmp/blis-main main
go build -o /tmp/blis-main-bin main.go   # from main worktree
go build -o /tmp/blis-pd-bin main.go     # from pd branch (with fix)

# For each scenario: run both binaries with identical flags, diff stdout + JSON
diff <(main_bin run ...) <(pd_bin run ...)

Results

#	Scenario	Flags	stdout	JSON
1	Single-instance blackbox	`--model qwen/qwen3-14b --num-requests 50 --seed 42`	identical	identical
2	Cluster weighted routing	`--num-instances 4 --routing-policy weighted --admission-policy always-admit --num-requests 100 --seed 42`	identical	identical
3	Cluster with full tracing	`--num-instances 3 --routing-policy weighted --trace-level full --num-requests 50 --seed 42`	identical	—
4a	Cluster round-robin	`--num-instances 3 --routing-policy round-robin --num-requests 80 --seed 7`	identical	—
4b	Cluster least-loaded	`--num-instances 3 --routing-policy least-loaded --num-requests 80 --seed 7`	identical	—
5	Token bucket + priority scheduling	`--num-instances 2 --routing-policy weighted --admission-policy token-bucket --token-bucket-capacity 20 --token-bucket-refill-rate 5 --scheduler priority-fcfs --priority-policy slo-based --num-requests 60 --seed 99`	identical	—

All scenarios used --model qwen/qwen3-14b. Stdout was captured separately from stderr (which contains non-deterministic timing). "identical" means diff reported zero differences (byte-identical output).

Coverage

These scenarios exercise the key code paths that the pd branch modifies:

Scenario 1: Core single-instance simulator path (batch formation, step execution, completion)
Scenarios 2–4: Cluster event pipeline including the admission → routing bifurcation point (AdmissionDecisionEvent.Execute()) and notifyDisaggregationObserver call in RoutingDecisionEvent
Scenario 3: Trace summary output (verifies no extra PD summary lines when PD disabled)
Scenario 5: Admission rejection + priority reordering + scheduler interaction

What's not covered by runtime comparison

Zero-input request edge case (the actual fix in this PR) — no standard workload generator produces these; verified by code inspection that ProgressIndex > 0 guard prevents the decode-only path from firing
Roofline/crossmodel latency backends — require HF config files; verified by code audit that maxModelLen auto-derivation and all other analytical-backend code paths are unchanged between main and pd (zero diff)

namasl merged commit 1172a5a into inference-sim:pd Mar 12, 2026

namasl mentioned this pull request Mar 12, 2026

plan: transfer PD disaggregation phase 1 from pd branch to main #631

Closed

5 tasks

This was referenced Mar 20, 2026

[Epic] Prefill-Decode Disaggregation Feature - Merge Plan (10 PRs) #777

Closed

PR10: Bug Fixes and Edge Case Handling #787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sim): guard decode-only batch path for non-PD transparency#628

fix(sim): guard decode-only batch path for non-PD transparency#628
namasl merged 1 commit intoinference-sim:pdfrom
namasl:fix/pd-batch-formation-zero-input-guard

namasl commented Mar 12, 2026

Uh oh!

namasl commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

namasl commented Mar 12, 2026

Summary

Context

Full backward compatibility audit

Test plan

Uh oh!

namasl commented Mar 12, 2026

Verification: non-PD output identity between main and pd

Method

Results

Coverage

What's not covered by runtime comparison

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Verification: non-PD output identity between `main` and `pd`