feat(sim): add PrefixThresholdDecider for prefix-aware PD disaggregation (PR5)#620
Merged
namasl merged 13 commits intoinference-sim:pdfrom Mar 12, 2026
Merged
Conversation
Adds PrefixThresholdDecider: disaggregates when non-cached token count exceeds threshold. Maintains a router-side PrefixCacheIndex under a single globalVirtualInstance key to track cluster-wide prefix knowledge. Also adds DisaggregationObserver interface for stateful deciders that learn from routing decisions (ObserveRouting called synchronously by ClusterSimulator after each routing decision). Implements cachedHashes/cachedReqID pattern (mirrors routing_prefix_scorer.go) to avoid double-hashing between Decide() and ObserveRouting(). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Adds "prefix-threshold" to the valid disaggregation decider names in bundle.go, enabling IsValidDisaggregationDecider() and ValidDisaggregationDeciderNames() to recognize the new decider. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Adds PDPrefixThreshold int to DeploymentConfig for the prefix-threshold decider's non-cached token threshold. Adds --pd-prefix-threshold CLI flag (default 512) with >= 0 validation. Updates --pd-decider description to include "prefix-threshold". Wires PDPrefixThreshold into the config construction in cmd/root.go. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…rver - cluster.go: branch on PDDecider=="prefix-threshold" to construct PrefixThresholdDecider(PDPrefixThreshold, BlockSizeTokens) directly; add notifyDisaggregationObserver helper - cluster_event.go: call notifyDisaggregationObserver after RoutingDecisionEvent injection (standard routing path, BC-PD-28) - pd_events.go: call notifyDisaggregationObserver after PrefillRoutingEvent injection (disaggregated path, BC-PD-28) - disaggregation_test.go: add integration tests verifying wiring, high/low threshold behavior, observer call, and transfer conservation Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…reshold Documents DisaggregationObserver interface, PrefixThresholdDecider, and PDPrefixThreshold in the file organization table. Adds --pd-prefix-threshold to the CLI flags list and the disaggregated data flow CLI flags summary. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Round 1 fixes: - Update DisaggregationObserver docstring with R13 note and R17 signal freshness guarantee - Add noopDisaggregationObserver to satisfy R13 (>=2 implementations) - Add explicit comment in DecodeRoutingEvent explaining why observer is intentionally not called on the decode sub-path - Add 'Adding New Disaggregation Deciders' section to extension-recipes.md covering both stateless and stateful patterns - Fix Decide() docstring: clarify hash-reuse fires only on non-disaggregated path (disaggregated path receives prefill sub-request with different ID) Round 2 fixes: - Remove structural TestPrefixThreshold_DeciderWiredCorrectly; replace with compile-time interface assertion (behavioral coverage exists in ZeroThresholdAlwaysDisaggregates and HighThresholdNoDisaggregation) - Add PDPrefixThreshold field to newTestDisaggDeploymentConfig (R4 construction site audit) - Add PD disaggregation row to CLI Flag Summary table in configuration.md including --pd-prefix-threshold and all PD flags introduced in PR1-PR4 Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ound 3) - Add logrus.Warn when --pd-prefix-threshold is explicitly set but --pd-decider is not "prefix-threshold" (silent flag ignored, R3 spirit) - Add PD Disaggregation section to docs/reference/configuration.md explaining decider options, prefix-threshold semantics (non-cached tokens vs total tokens), default value meaning, and all PD flags Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…arming effect TestPrefixThreshold_ObserverWarmsCache replaces the previous TestPrefixThreshold_ObserverCalledAfterRouting which only checked causality invariants (TransferCompleteTime != 0) and did not verify that the DisaggregationObserver actually warmed the prefix cache. The new test uses two requests with a shared 192-token prefix (threshold=150): req1 disaggregates (192 > 150 with empty cache), observer records 12 blocks, req2 arrives later with the same prefix (58 non-cached tokens ≤ 150) and is routed locally — proving the observer was called and the cache was warmed. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…n.md cross-link - extension-recipes.md: add missing step 4 (update configuration.md table) and fix touch-point count from 3 to 4 for stateless disaggregation deciders - configuration.md: replace broken cross-link to non-existent architecture.md#pd-disaggregation with descriptive text documenting the pool topology constraint (prefill + decode <= num-instances) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…tingEvent - Restore droppedKVAllocations counter and DroppedKVAllocations() accessor (R1: count dropped requests — never silent) - Move AssignedInstance/DecodeInstanceID/DecodeEnqueueTime assignment to after successful AllocateTransferredKV check to prevent inconsistent state on failure - Restore DroppedKVAllocations() in cmd/root.go anomaly counter output - Add INV-PD-1 through INV-PD-5 back to invariants.md for DRY compliance (these were in CLAUDE.md but not the canonical standards doc) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Merge upstream pd branch after PR3 and PR4 landed. Conflicts resolved: - CLAUDE.md: combined PD metrics additions with PDPrefixThreshold - invariants.md: adopted upstream's detailed INV-PD-1..5 descriptions - bundle.go: kept prefix-threshold in validDisaggregationDeciders - pd_events.go: adopted upstream's droppedAtDecodeKV counter, trace recording, and InjectDecodeOnline(time) signature; preserved observer no-op comment from PR5 - cluster.go: unified duplicate KV drop counters (droppedKVAllocations → droppedAtDecodeKV) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…uctural tests Critical: Add CLI validation that --pd-decider other than "never" requires --prefill-instances and --decode-instances to be set (R1: no silent config failure). Important: - Remove superseded disaggregation decider recipe (incorrect field names, wrong flag names, stale purity contract) from extension-recipes.md - Add prefix-threshold to first PD flags table in configuration.md - Replace internal field accesses (cs.parentRequests, cs.transfersInitiated, cs.trace, cs.droppedAtDecodeKV) with public accessors (cs.ParentRequests(), cs.Trace(), cs.DroppedKVAllocations()) in disaggregation_test.go for refactor survival (BDD/TDD principle inference-sim#5) Also includes minor comment condensations from code-simplifier agent. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add the new PD-mode anomaly counter to the results guide so users understand what it means when decode KV allocation fails. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This was referenced Mar 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes part of #577 — implements PR5 from #595
Summary
PrefixThresholdDecider— disaggregates requests to dedicated prefill pools when non-cached token count exceeds a configurable threshold (--pd-prefix-threshold, default 512)DisaggregationObserverinterface for stateful deciders that need routing feedback to update their internal state"prefix-threshold"invalidDisaggregationDeciders;NewDisaggregationDeciderpanics with a helpful message directing callers toNewPrefixThresholdDecider(threshold, blockSize)PDPrefixThreshold inttoDeploymentConfig; wired throughcmd/root.govia--pd-prefix-thresholddocs/contributing/standards/invariants.mddocs/contributing/extension-recipes.mdDesign
PrefixThresholdDecidermaintains a router-sidePrefixCacheIndexkeyed onglobalVirtualInstance = "__global__"to approximate cluster-wide prefix knowledge.Decide()computesnonCachedTokens = len(InputTokens) - cachedBlocks*blockSizeand returnsDisaggregate = nonCachedTokens > threshold.ObserveRouting()records the request's block hashes after routing so future decisions benefit from updated cache state (INV-7 / R17 signal freshness).Test plan
go test ./...passesgo build ./...passesTestPrefixThresholdDecider*— unit tests for Decide/ObserveRouting logicTestDisaggregationObserver*— interface compliance tests--pd-decider prefix-thresholdroutes high-prefill requests to prefill pool🤖 Generated with Claude Code