Skip to content

feat(sim): add PrefixThresholdDecider for prefix-aware PD disaggregation (PR5)#620

Merged
namasl merged 13 commits intoinference-sim:pdfrom
namasl:pr5-prefix-threshold-decider
Mar 12, 2026
Merged

feat(sim): add PrefixThresholdDecider for prefix-aware PD disaggregation (PR5)#620
namasl merged 13 commits intoinference-sim:pdfrom
namasl:pr5-prefix-threshold-decider

Conversation

@namasl
Copy link
Copy Markdown
Contributor

@namasl namasl commented Mar 11, 2026

Closes part of #577 — implements PR5 from #595

Summary

  • Adds PrefixThresholdDecider — disaggregates requests to dedicated prefill pools when non-cached token count exceeds a configurable threshold (--pd-prefix-threshold, default 512)
  • Adds DisaggregationObserver interface for stateful deciders that need routing feedback to update their internal state
  • Registers "prefix-threshold" in validDisaggregationDeciders; NewDisaggregationDecider panics with a helpful message directing callers to NewPrefixThresholdDecider(threshold, blockSize)
  • Adds PDPrefixThreshold int to DeploymentConfig; wired through cmd/root.go via --pd-prefix-threshold
  • Documents INV-PD-1 through INV-PD-5 in docs/contributing/standards/invariants.md
  • Adds stateful decider recipe to docs/contributing/extension-recipes.md

Design

PrefixThresholdDecider maintains a router-side PrefixCacheIndex keyed on globalVirtualInstance = "__global__" to approximate cluster-wide prefix knowledge. Decide() computes nonCachedTokens = len(InputTokens) - cachedBlocks*blockSize and returns Disaggregate = nonCachedTokens > threshold. ObserveRouting() records the request's block hashes after routing so future decisions benefit from updated cache state (INV-7 / R17 signal freshness).

Test plan

  • go test ./... passes
  • go build ./... passes
  • TestPrefixThresholdDecider* — unit tests for Decide/ObserveRouting logic
  • TestDisaggregationObserver* — interface compliance tests
  • Integration: cluster simulation with --pd-decider prefix-threshold routes high-prefill requests to prefill pool

🤖 Generated with Claude Code

namasl and others added 10 commits March 11, 2026 15:33
Adds PrefixThresholdDecider: disaggregates when non-cached token count
exceeds threshold. Maintains a router-side PrefixCacheIndex under a
single globalVirtualInstance key to track cluster-wide prefix knowledge.

Also adds DisaggregationObserver interface for stateful deciders that
learn from routing decisions (ObserveRouting called synchronously by
ClusterSimulator after each routing decision).

Implements cachedHashes/cachedReqID pattern (mirrors routing_prefix_scorer.go)
to avoid double-hashing between Decide() and ObserveRouting().

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Adds "prefix-threshold" to the valid disaggregation decider names in
bundle.go, enabling IsValidDisaggregationDecider() and
ValidDisaggregationDeciderNames() to recognize the new decider.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Adds PDPrefixThreshold int to DeploymentConfig for the prefix-threshold
decider's non-cached token threshold. Adds --pd-prefix-threshold CLI flag
(default 512) with >= 0 validation. Updates --pd-decider description to
include "prefix-threshold". Wires PDPrefixThreshold into the config
construction in cmd/root.go.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…rver

- cluster.go: branch on PDDecider=="prefix-threshold" to construct
  PrefixThresholdDecider(PDPrefixThreshold, BlockSizeTokens) directly;
  add notifyDisaggregationObserver helper
- cluster_event.go: call notifyDisaggregationObserver after RoutingDecisionEvent
  injection (standard routing path, BC-PD-28)
- pd_events.go: call notifyDisaggregationObserver after PrefillRoutingEvent
  injection (disaggregated path, BC-PD-28)
- disaggregation_test.go: add integration tests verifying wiring, high/low
  threshold behavior, observer call, and transfer conservation

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…reshold

Documents DisaggregationObserver interface, PrefixThresholdDecider, and
PDPrefixThreshold in the file organization table. Adds --pd-prefix-threshold
to the CLI flags list and the disaggregated data flow CLI flags summary.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Round 1 fixes:
- Update DisaggregationObserver docstring with R13 note and R17 signal
  freshness guarantee
- Add noopDisaggregationObserver to satisfy R13 (>=2 implementations)
- Add explicit comment in DecodeRoutingEvent explaining why observer
  is intentionally not called on the decode sub-path
- Add 'Adding New Disaggregation Deciders' section to extension-recipes.md
  covering both stateless and stateful patterns
- Fix Decide() docstring: clarify hash-reuse fires only on non-disaggregated
  path (disaggregated path receives prefill sub-request with different ID)

Round 2 fixes:
- Remove structural TestPrefixThreshold_DeciderWiredCorrectly; replace
  with compile-time interface assertion (behavioral coverage exists in
  ZeroThresholdAlwaysDisaggregates and HighThresholdNoDisaggregation)
- Add PDPrefixThreshold field to newTestDisaggDeploymentConfig (R4
  construction site audit)
- Add PD disaggregation row to CLI Flag Summary table in configuration.md
  including --pd-prefix-threshold and all PD flags introduced in PR1-PR4

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ound 3)

- Add logrus.Warn when --pd-prefix-threshold is explicitly set but
  --pd-decider is not "prefix-threshold" (silent flag ignored, R3 spirit)
- Add PD Disaggregation section to docs/reference/configuration.md
  explaining decider options, prefix-threshold semantics (non-cached
  tokens vs total tokens), default value meaning, and all PD flags

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…arming effect

TestPrefixThreshold_ObserverWarmsCache replaces the previous
TestPrefixThreshold_ObserverCalledAfterRouting which only checked
causality invariants (TransferCompleteTime != 0) and did not verify
that the DisaggregationObserver actually warmed the prefix cache.

The new test uses two requests with a shared 192-token prefix
(threshold=150): req1 disaggregates (192 > 150 with empty cache),
observer records 12 blocks, req2 arrives later with the same prefix
(58 non-cached tokens ≤ 150) and is routed locally — proving the
observer was called and the cache was warmed.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…n.md cross-link

- extension-recipes.md: add missing step 4 (update configuration.md table) and
  fix touch-point count from 3 to 4 for stateless disaggregation deciders
- configuration.md: replace broken cross-link to non-existent
  architecture.md#pd-disaggregation with descriptive text documenting
  the pool topology constraint (prefill + decode <= num-instances)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…tingEvent

- Restore droppedKVAllocations counter and DroppedKVAllocations() accessor
  (R1: count dropped requests — never silent)
- Move AssignedInstance/DecodeInstanceID/DecodeEnqueueTime assignment to
  after successful AllocateTransferredKV check to prevent inconsistent state
  on failure
- Restore DroppedKVAllocations() in cmd/root.go anomaly counter output
- Add INV-PD-1 through INV-PD-5 back to invariants.md for DRY compliance
  (these were in CLAUDE.md but not the canonical standards doc)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@namasl namasl changed the title feat(sim): add PrefixThresholdDecider for prefix-aware PD disaggregation feat(sim): add PrefixThresholdDecider for prefix-aware PD disaggregation (PR5) Mar 11, 2026
namasl and others added 3 commits March 12, 2026 01:23
Merge upstream pd branch after PR3 and PR4 landed. Conflicts resolved:
- CLAUDE.md: combined PD metrics additions with PDPrefixThreshold
- invariants.md: adopted upstream's detailed INV-PD-1..5 descriptions
- bundle.go: kept prefix-threshold in validDisaggregationDeciders
- pd_events.go: adopted upstream's droppedAtDecodeKV counter, trace
  recording, and InjectDecodeOnline(time) signature; preserved observer
  no-op comment from PR5
- cluster.go: unified duplicate KV drop counters (droppedKVAllocations
  → droppedAtDecodeKV)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…uctural tests

Critical: Add CLI validation that --pd-decider other than "never" requires
--prefill-instances and --decode-instances to be set (R1: no silent config
failure).

Important:
- Remove superseded disaggregation decider recipe (incorrect field names,
  wrong flag names, stale purity contract) from extension-recipes.md
- Add prefix-threshold to first PD flags table in configuration.md
- Replace internal field accesses (cs.parentRequests, cs.transfersInitiated,
  cs.trace, cs.droppedAtDecodeKV) with public accessors (cs.ParentRequests(),
  cs.Trace(), cs.DroppedKVAllocations()) in disaggregation_test.go for
  refactor survival (BDD/TDD principle inference-sim#5)

Also includes minor comment condensations from code-simplifier agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add the new PD-mode anomaly counter to the results guide so users
understand what it means when decode KV allocation fails.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant