fix(sim): guard decode-only batch path for non-PD transparency#628
Merged
namasl merged 1 commit intoinference-sim:pdfrom Mar 12, 2026
Merged
Conversation
The PD decode-only path in VLLMBatchFormation.FormBatch() used the condition `ProgressIndex >= inputLen` to detect decode sub-requests with pre-allocated KV. For zero-input requests (len(InputTokens)==0), this condition is satisfied since ProgressIndex(0) >= inputLen(0), causing non-PD requests to incorrectly take the decode-only path. Add `ProgressIndex > 0` guard so the path only fires for PD decode sub-requests where AllocateTransferredKV explicitly set ProgressIndex to inputLen (which is > 0 for real requests). This ensures the pd branch is a fully transparent drop-in replacement for non-PD users. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Contributor
Author
Verification: non-PD output identity between
|
| # | Scenario | Flags | stdout | JSON |
|---|---|---|---|---|
| 1 | Single-instance blackbox | --model qwen/qwen3-14b --num-requests 50 --seed 42 |
identical | identical |
| 2 | Cluster weighted routing | --num-instances 4 --routing-policy weighted --admission-policy always-admit --num-requests 100 --seed 42 |
identical | identical |
| 3 | Cluster with full tracing | --num-instances 3 --routing-policy weighted --trace-level full --num-requests 50 --seed 42 |
identical | — |
| 4a | Cluster round-robin | --num-instances 3 --routing-policy round-robin --num-requests 80 --seed 7 |
identical | — |
| 4b | Cluster least-loaded | --num-instances 3 --routing-policy least-loaded --num-requests 80 --seed 7 |
identical | — |
| 5 | Token bucket + priority scheduling | --num-instances 2 --routing-policy weighted --admission-policy token-bucket --token-bucket-capacity 20 --token-bucket-refill-rate 5 --scheduler priority-fcfs --priority-policy slo-based --num-requests 60 --seed 99 |
identical | — |
All scenarios used --model qwen/qwen3-14b. Stdout was captured separately from stderr (which contains non-deterministic timing). "identical" means diff reported zero differences (byte-identical output).
Coverage
These scenarios exercise the key code paths that the pd branch modifies:
- Scenario 1: Core single-instance simulator path (batch formation, step execution, completion)
- Scenarios 2–4: Cluster event pipeline including the admission → routing bifurcation point (
AdmissionDecisionEvent.Execute()) andnotifyDisaggregationObservercall inRoutingDecisionEvent - Scenario 3: Trace summary output (verifies no extra PD summary lines when PD disabled)
- Scenario 5: Admission rejection + priority reordering + scheduler interaction
What's not covered by runtime comparison
- Zero-input request edge case (the actual fix in this PR) — no standard workload generator produces these; verified by code inspection that
ProgressIndex > 0guard prevents the decode-only path from firing - Roofline/crossmodel latency backends — require HF config files; verified by code audit that
maxModelLenauto-derivation and all other analytical-backend code paths are unchanged between main and pd (zero diff)
5 tasks
This was referenced Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ProgressIndex > 0guard to the PD decode-only batch formation path inVLLMBatchFormation.FormBatch(), ensuring it only fires for PD decode sub-requests (whereAllocateTransferredKVexplicitly setProgressIndex = inputLen > 0), not for pathological zero-input requests in non-PD modepdbranch is a fully transparent drop-in replacement for existing BLIS users who don't use PD disaggregationContext
The PD decode-only path used the condition
ProgressIndex >= inputLento detect decode sub-requests with pre-allocated KV. For hypothetical zero-input requests (len(InputTokens)==0), this condition is satisfied sinceProgressIndex(0) >= inputLen(0), causing non-PD requests to incorrectly take the decode-only path instead of the normal prefill path.Full backward compatibility audit
A comprehensive audit of all 37 changed files (~8700 lines) on the
pdbranch confirmed this was the only behavioral difference affecting non-PD users. All other PD additions are properly gated:--pd-decider=never, instances=0)cs.poolsConfigured()(nil when PD disabled)PDMetricsis nil;printPDMetrics(nil)is no-opDroppedKVAllocationsline gated on> 0EnqueueDecodeSubRequestonly called by PD cluster logicTest plan
go test ./...passes (all 11 packages)go vet ./...cleanmainandpdbranches🤖 Generated with Claude Code