feat(autoscaler): Phase 1C CLI wiring + three end-to-end bug fixes + demo#995
feat(autoscaler): Phase 1C CLI wiring + three end-to-end bug fixes + demo#995vishakha-ramani wants to merge 8 commits intoinference-sim:mainfrom
Conversation
Design document for wiring Phase 1A NodePools and Phase 1C-1b autoscaler pipeline to the CLI via --policy-config YAML extension. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add AutoscalerAnalyzerConfig to DeploymentConfig and effectiveAnalyzerConfig() helper for WVA reference defaults. NewClusterSimulator now wires DefaultCollector+V2SaturationAnalyzer+UnlimitedEngine+DirectActuator when ModelAutoscalerIntervalUs > 0, so cmd/ only sets config fields. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
… bundle conversion Reads node_pools and autoscaler sections from --policy-config YAML and converts them into DeploymentConfig. CLI flag --model-autoscaler-interval-us overrides the YAML value when explicitly set. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…wers Bug 1 — DirectActuator.scaleUp was a stub (direct_actuator.go): Only called PlaceInstance() (GPU reservation) but never constructed InstanceSimulator objects. Scale-up appeared to succeed but produced phantom GPU reservations — no instance ever entered the routing pool, loaded, or served requests. Rewrote scaleUp to mirror NodeReadyEvent.Execute(): construct instance, transition to Loading, register with CachedSnapshotProvider, schedule loaded event, append to cs.instances, init inFlightRequests, wire OnRequestDone callback. Bug 2 — scheduleNextTick had no termination guard (autoscaler.go): ScalingTickEvent always rescheduled itself unconditionally. On request-bounded runs (Horizon == MaxInt64), the cluster event queue never empties → infinite event loop at 100%+ CPU. Added guard: stop self-scheduling when Horizon == MaxInt64 && pendingArrivals == 0 && inFlight == 0. Added pendingArrivals int field to ClusterSimulator, incremented in Run() per arrival push, decremented in ClusterArrivalEvent.Execute(). Bug 3 — promoteDeferred corrupted pendingArrivals (cluster.go): Deferred batch/background requests re-injected via promoteDeferred() as new ClusterArrivalEvents without incrementing pendingArrivals. Each re-injected event decremented the counter on execution, driving it to -N for N deferred requests. The Bug 2 guard checks == 0 exactly — never true when negative → hang persisted on workloads with batch cohorts. Fixed by incrementing pendingArrivals per re-injected arrival in promoteDeferred(), mirroring Run(). Invariant: pendingArrivals == count of ClusterArrivalEvents in cs.clusterEvents at all times. Also adds: - examples/autoscaler-demo.yaml: self-contained demo config with node_pools + autoscaler sections, tuned for regression_workload_load_spikes - .gitignore: docs/superpowers/ (agent-local, not committed) - cluster_test.go: remove orphaned merge conflict marker Closes inference-sim#994 Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
@claude /pr-review-toolkit:review-pr Please perform a thorough review of this PR with respect to both the original issue. Evaluate whether the PR fully and correctly addresses all requirements and reviewer concerns. In particular, assess: Correctness and preservation of invariants Finally, provide a clear verdict: Is this PR ready to merge? If not, what specific changes are required? |
|
Claude finished @vishakha-ramani's task in 8m 44s —— View job PR Review: feat(autoscaler): Phase 1C CLI wiring + three end-to-end bug fixes + demoProgress:
SummaryThis PR wires the Phase 1C autoscaler pipeline into Critical Issues1.
|
…gression tests, validate analyzer config, eliminate double bundle load - `git rm --cached` the two docs/superpowers/ files that were inadvertently committed; .gitignore already lists the directory as agent-local - Add TestAutoscaler_RequestBoundedRun_Terminates and TestAutoscaler_WithBatchRequests_Terminates as regression tests for Bug 2 (infinite tick loop) and Bug 3 (pendingArrivals going negative from deferred batch re-injection); both must complete in ≤2s - Add range validation for AnalyzerBundleConfig thresholds in bundle.Validate() — catches invalid scale_up_threshold, scale_down_boundary, avg_input_tokens, and kv_cache_threshold at the CLI boundary (R3) before they reach NewV2SaturationAnalyzer panics; add TestPolicyBundle_Validate_AnalyzerThresholds table test - Refactor resolvePolicies() to return (*sim.PolicyBundle, []sim.ScorerConfig) so runCmd.RunE can reuse the loaded+validated bundle instead of calling LoadPolicyBundle a second time; replay.go discards the bundle (no autoscaler) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
@claude /pr-review-toolkit:review-pr Please perform a thorough review of this PR with respect to both the original issue. Evaluate whether the PR fully and correctly addresses all requirements and reviewer concerns. In particular, assess: Correctness and preservation of invariants Finally, provide a clear verdict: Is this PR ready to merge? If not, what specific changes are required? |
|
Claude finished @vishakha-ramani's task in 8m 56s —— View job PR Review: feat(autoscaler): Phase 1C CLI wiring + three end-to-end bug fixes + demoProgress:
Status of Previous Review's Blocking IssuesThe latest commit ( Previously Blocking — now resolved:
Fresh Review of All ChangesCorrectnessBug fixes verified:
Analyzer threshold validation is consistent with panics: INV-1 preservation: INV-6 preserved: Minor Findings (non-blocking)Variable shadowing in for i := 0; i < d.Delta; i++ { // outer i
...
for i := range cs.config.NodePools { // inner i shadows outerThe logic is correct (inner Session follow-ups not tracked in Design Discipline
Test CoverageNew tests verify:
Coverage is thorough for the new surfaces. All tests in Documentation
Verdict: Ready to Merge ✅All four blocking issues from the prior review are resolved. CI passes (build + lint + all tests). The two minor new findings (variable shadowing, session+autoscaler edge case) are non-blocking — the logic is correct in both cases and the behavior is consistent with the rest of the codebase. |
|
Per the review request, the three bug fixes have been broken out into sub-issues:
Each sub-issue contains a full description of the symptom, root cause, proposed fix (with before/after code), affected files, and the regression test that guards against recurrence. The intent is to restructure this PR into three smaller PRs — one per bug fix — to make each change individually reviewable. |
Summary
DefaultCollector → V2SaturationAnalyzer → UnlimitedEngine → DirectActuator) intoblis runvia--policy-config.examples/autoscaler-demo.yamlfor a self-contained demo against the spike workload.docs/superpowers/to.gitignore(agent-local artifacts, not committed).Closes #994. Part of #696.
Commits
534a41c3cfb9c99d20da9node_poolsandautoscalersections toPolicyBundleb6dbe93NewClusterSimulator8233f29--model-autoscaler-interval-usflag and policy-config bundle conversion3375856Bugs Fixed (commit
3375856)Bug 1 —
DirectActuator.scaleUpwas a stubscaleUpcalledPlaceInstance()(GPU reservation) but never constructedInstanceSimulatorobjects. Scale-up logged as successful but produced phantom GPU reservations — no instance ever
entered the routing pool, loaded, or served requests.
Fix: Rewrote
scaleUpto mirrorNodeReadyEvent.Execute()exactly: construct instance,transition to Loading, register with
CachedSnapshotProvider, schedule loaded event, append tocs.instances, initinFlightRequests, wireOnRequestDonecallback.Maintenance note:
DirectActuator.scaleUpandNodeReadyEvent.Execute()are now parallelpaths. Any new field added to the node-ready creation path must be mirrored here.
Bug 2 —
scheduleNextTickhad no termination guardScalingTickEventalways rescheduled itself unconditionally. On request-bounded runs(
Horizon == math.MaxInt64), the cluster event queue never drains → infinite loop at 100%+ CPU.Fix: Stop self-scheduling when
Horizon == math.MaxInt64 && pendingArrivals == 0 && inFlight == 0. AddedpendingArrivals inttoClusterSimulator, incremented inRun()perarrival push, decremented in
ClusterArrivalEvent.Execute().Bug 3 —
promoteDeferredcorruptedpendingArrivalsDeferred batch/background requests re-injected via
promoteDeferred()as newClusterArrivalEvents without incrementingpendingArrivals. Each decremented the counter onexecution, driving it to
-Nfor N deferred requests. The Bug 2 guard checks== 0exactly —never true when negative → hang persisted on any workload with a batch/background cohort.
Fix: Increment
pendingArrivalsinpromoteDeferred()per re-injected arrival.Invariant:
pendingArrivals == count of ClusterArrivalEvents currently in cs.clusterEvents.Demo
Scale events observed:
Results (3000 req, 30 req/s, Llama-2-7b, A100-80):
Test plan
go test ./sim/cluster/...passes (43s, all tests green)go build ./...passes--log infooutputdocs/superpowers/does not appear ingit status🤖 Generated with Claude Code