Skip to content

fix(sim): guard decode-only batch path for non-PD transparency#628

Merged
namasl merged 1 commit intoinference-sim:pdfrom
namasl:fix/pd-batch-formation-zero-input-guard
Mar 12, 2026
Merged

fix(sim): guard decode-only batch path for non-PD transparency#628
namasl merged 1 commit intoinference-sim:pdfrom
namasl:fix/pd-batch-formation-zero-input-guard

Conversation

@namasl
Copy link
Copy Markdown
Contributor

@namasl namasl commented Mar 12, 2026

Summary

  • Adds ProgressIndex > 0 guard to the PD decode-only batch formation path in VLLMBatchFormation.FormBatch(), ensuring it only fires for PD decode sub-requests (where AllocateTransferredKV explicitly set ProgressIndex = inputLen > 0), not for pathological zero-input requests in non-PD mode
  • Ensures the pd branch is a fully transparent drop-in replacement for existing BLIS users who don't use PD disaggregation

Context

The PD decode-only path used the condition ProgressIndex >= inputLen to detect decode sub-requests with pre-allocated KV. For hypothetical zero-input requests (len(InputTokens)==0), this condition is satisfied since ProgressIndex(0) >= inputLen(0), causing non-PD requests to incorrectly take the decode-only path instead of the normal prefill path.

Full backward compatibility audit

A comprehensive audit of all 37 changed files (~8700 lines) on the pd branch confirmed this was the only behavioral difference affecting non-PD users. All other PD additions are properly gated:

Area Guard mechanism
CLI flags All 10 new PD flags default to disabled (--pd-decider=never, instances=0)
Event pipeline Bifurcation gated by cs.poolsConfigured() (nil when PD disabled)
Metrics/JSON output PDMetrics is nil; printPDMetrics(nil) is no-op
Trace records PD records never recorded; summary PD fields are 0
Anomaly counters Extra DroppedKVAllocations line gated on > 0
Core simulator EnqueueDecodeSubRequest only called by PD cluster logic
defaults.yaml No pd-specific changes (all diffs came from main via merge)

Test plan

  • go test ./... passes (all 11 packages)
  • go vet ./... clean
  • Verify existing non-PD simulation produces identical output on both main and pd branches

🤖 Generated with Claude Code

The PD decode-only path in VLLMBatchFormation.FormBatch() used the
condition `ProgressIndex >= inputLen` to detect decode sub-requests
with pre-allocated KV. For zero-input requests (len(InputTokens)==0),
this condition is satisfied since ProgressIndex(0) >= inputLen(0),
causing non-PD requests to incorrectly take the decode-only path.

Add `ProgressIndex > 0` guard so the path only fires for PD decode
sub-requests where AllocateTransferredKV explicitly set ProgressIndex
to inputLen (which is > 0 for real requests). This ensures the pd
branch is a fully transparent drop-in replacement for non-PD users.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 12, 2026

Verification: non-PD output identity between main and pd

Built binaries from both branches (pd includes this fix) and ran 6 scenarios with identical arguments, comparing stdout and JSON output.

Method

# Build both branches via worktrees
git worktree add /tmp/blis-main main
go build -o /tmp/blis-main-bin main.go   # from main worktree
go build -o /tmp/blis-pd-bin main.go     # from pd branch (with fix)

# For each scenario: run both binaries with identical flags, diff stdout + JSON
diff <(main_bin run ...) <(pd_bin run ...)

Results

# Scenario Flags stdout JSON
1 Single-instance blackbox --model qwen/qwen3-14b --num-requests 50 --seed 42 identical identical
2 Cluster weighted routing --num-instances 4 --routing-policy weighted --admission-policy always-admit --num-requests 100 --seed 42 identical identical
3 Cluster with full tracing --num-instances 3 --routing-policy weighted --trace-level full --num-requests 50 --seed 42 identical
4a Cluster round-robin --num-instances 3 --routing-policy round-robin --num-requests 80 --seed 7 identical
4b Cluster least-loaded --num-instances 3 --routing-policy least-loaded --num-requests 80 --seed 7 identical
5 Token bucket + priority scheduling --num-instances 2 --routing-policy weighted --admission-policy token-bucket --token-bucket-capacity 20 --token-bucket-refill-rate 5 --scheduler priority-fcfs --priority-policy slo-based --num-requests 60 --seed 99 identical

All scenarios used --model qwen/qwen3-14b. Stdout was captured separately from stderr (which contains non-deterministic timing). "identical" means diff reported zero differences (byte-identical output).

Coverage

These scenarios exercise the key code paths that the pd branch modifies:

  • Scenario 1: Core single-instance simulator path (batch formation, step execution, completion)
  • Scenarios 2–4: Cluster event pipeline including the admission → routing bifurcation point (AdmissionDecisionEvent.Execute()) and notifyDisaggregationObserver call in RoutingDecisionEvent
  • Scenario 3: Trace summary output (verifies no extra PD summary lines when PD disabled)
  • Scenario 5: Admission rejection + priority reordering + scheduler interaction

What's not covered by runtime comparison

  • Zero-input request edge case (the actual fix in this PR) — no standard workload generator produces these; verified by code inspection that ProgressIndex > 0 guard prevents the decode-only path from firing
  • Roofline/crossmodel latency backends — require HF config files; verified by code audit that maxModelLen auto-derivation and all other analytical-backend code paths are unchanged between main and pd (zero diff)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant