Skip to content

feat(sim): pool topology and disaggregation decision pipeline#599

Merged
sriumcp merged 2 commits intoinference-sim:pdfrom
namasl:pr1-pool-topology
Mar 11, 2026
Merged

feat(sim): pool topology and disaggregation decision pipeline#599
sriumcp merged 2 commits intoinference-sim:pdfrom
namasl:pr1-pool-topology

Conversation

@namasl
Copy link
Copy Markdown
Contributor

@namasl namasl commented Mar 10, 2026

Summary

Establishes PD (Prefill-Decode) disaggregation foundation for issue #591:

  • DisaggregationDecider interface (sim/disaggregation.go): NeverDisaggregate and AlwaysDisaggregate implementations with factory and validation map
  • Pool topology (sim/cluster/pool.go): PoolRole type, ValidatePoolTopology(), BuildPoolMembership() — pool membership built once at init, never mutated (INV-PD-5)
  • DisaggregationDecisionEvent (sim/cluster/cluster_event.go): Priority 3 event inserted between admission and routing when pools are configured; conditional branch in AdmissionDecisionEvent.Execute()
  • CLI flags: --prefill-instances, --decode-instances, --pd-decider with validation
  • Zero-value backward compatibility: All 20+ existing DeploymentConfig construction sites are automatically backward-compatible (R4 audit confirmed)

Behavioral Contracts

Contract Verification
BC-PD-1: Byte-identical output when pools disabled Verified at seeds 42, 123, 999
BC-PD-2: Invalid pool config returns error TestValidatePoolTopology (11 cases)
BC-PD-3: Pool membership unchanged from init TestBuildPoolMembership_Immutability
BC-PD-4: No DisaggregationDecisionEvent when pools not configured TestAdmissionDecisionEvent_NoPools_SchedulesRouting

Test plan

  • go build ./... — exit 0
  • go test ./... -count=1 — all 9 packages pass
  • go vet ./... — exit 0
  • BC-PD-1: byte-identical stdout at seeds 42, 123, 999 with --prefill-instances 0 --decode-instances 0
  • Integration: --num-instances 4 --prefill-instances 2 --decode-instances 2 --pd-decider always completes successfully

🤖 Generated with Claude Code

namasl and others added 2 commits March 10, 2026 19:17
…ference-sim#591)

Establishes PD (Prefill-Decode) disaggregation foundation: pool topology
as a first-class cluster concept and the disaggregation decision point in
the event pipeline. When pools are not configured (--prefill-instances=0
--decode-instances=0), output is byte-identical to pre-PR1 behavior
(BC-PD-1 verified at seeds 42, 123, 999).

- Add DisaggregationDecider interface with NeverDisaggregate and
  AlwaysDisaggregate implementations (sim/disaggregation.go)
- Add PoolRole type, ValidatePoolTopology, BuildPoolMembership
  (sim/cluster/pool.go)
- Add DisaggregationDecisionEvent (priority 3) to cluster event pipeline
- Conditional branch in AdmissionDecisionEvent.Execute: pools configured
  → DisaggregationDecisionEvent, otherwise → RoutingDecisionEvent
- Add PrefillInstances, DecodeInstances, PDDecider fields to
  DeploymentConfig (zero-value = disabled, all existing construction
  sites backward-compatible per R4 audit)
- Add --prefill-instances, --decode-instances, --pd-decider CLI flags
- Update CLAUDE.md with new files and flags

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Address 4 IMPORTANT findings from pr-code convergence review:

1. PoolMembership() now returns a defensive copy instead of the
   internal map, complying with R8 (no exported mutable maps)
2. CLI now calls ValidatePoolTopology at the boundary for clean
   error messages instead of relying on library panic
3. Factory test verifies concrete type via fmt.Sprintf("%T"),
   fixing unused wantType field
4. Integration tests verify INV-1 (request conservation) and
   INV-5 (causality) after simulation with disaggregation enabled

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@jgchn
Copy link
Copy Markdown
Collaborator

jgchn commented Mar 10, 2026

@claude summarize this PR

@claude
Copy link
Copy Markdown

claude bot commented Mar 10, 2026

Claude Code is working…

I'll analyze this and get back to you.

View job run

1 similar comment
@claude
Copy link
Copy Markdown

claude bot commented Mar 10, 2026

Claude Code is working…

I'll analyze this and get back to you.

View job run

@jgchn
Copy link
Copy Markdown
Collaborator

jgchn commented Mar 10, 2026

@claude summarize this PR

@claude
Copy link
Copy Markdown

claude bot commented Mar 10, 2026

Claude Code is working…

I'll analyze this and get back to you.

View job run

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 10, 2026

@claude What does this PR do?

@claude
Copy link
Copy Markdown

claude bot commented Mar 10, 2026

Claude Code is working…

I'll analyze this and get back to you.

View job run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants