fix(cluster): PD TTFT includes KV transfer and first decode step by namasl · Pull Request #931 · inference-sim/inference-sim

namasl · 2026-04-02T20:24:37Z

Summary

PD disaggregated TTFT was understated: it only captured prefill sub-request completion time (arrival → prefill done)
In llm-d, the first token reaches the user from the decode pod, not prefill: prefill completes → KV transfers → decode pod recomputes last prompt token and samples first output token
Updated projectPDMetrics() to compute user-visible TTFT as prefillTTFT + transferDuration + firstDecodeStep
TTFTSum adjusted by the delta to maintain consistency (BC-3)
Defensive fallback to prefill-only TTFT if decode data is unavailable

Closes #930

Behavioral Contracts

BC-1: RequestTTFTs[parentID] = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]
BC-2: New TTFT > prefill-only TTFT (transfer + decode add positive time)
BC-3: TTFTSum adjusted by delta to stay consistent with RequestTTFTs
BC-4: TTFT <= E2E (causality preserved)
BC-5: Non-PD clusters unaffected (projectPDMetrics is a no-op when parentRequests is empty)

Test plan

TestDisaggregation_TTFT_IncludesTransferAndDecode — verifies BC-1 (exact formula), BC-2 (exceeds prefill-only), BC-4 (causality)
Full go test ./... passes
go vet ./... clean

🤖 Generated with Claude Code

namasl · 2026-04-02T20:29:18Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-02T20:31:40Z

Claude finished @namasl's task in 13m 12s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context and changed files
Read implementation changes (cluster.go)
Read test changes (disaggregation_test.go)
Run specialized review agents (code, tests, errors)
Aggregate findings and post final review

Verdict: Approve with suggestions

The fix is correct and addresses the issue accurately. The implementation is clean, the formula is sound, and the core behavioral contracts are tested. A few test-quality improvements are worth considering before merge.

Correctness Assessment

Does the PR tackle the parent issue correctly? Yes.

The root cause (issue #930) was that RequestTTFTs[parentID] was set to the prefill sub-request TTFT only — capturing arrival → prefill_done. In llm-d PD disaggregation, the first user-visible token comes from the decode pod, not prefill. The correct formula is:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

The implementation at cluster.go:1196 gets this exactly right. The three components are sequential and additive.

TTFTSum adjustment (BC-3): m.TTFTSum += int64(newTTFT - prefillTTFT) is correct. At this point aggregateMetrics() has already summed in the prefill TTFT; the delta transferDuration + firstDecodeStep is the exact additive correction needed. All constituent values are int64-backed so no float truncation occurs.

R1 compliance (no silent data loss): Satisfied. prefillTTFT is read into a local variable at line 1189 before the delete calls at lines 1190–1191 — an explicit improvement over the previous pattern where the read was interleaved inside the if completed gate.

Defensive fallback: Appropriate. The else if hasPrefillTTFT branch at line 1200 uses prefill-only TTFT with a Warnf log when decode data is unavailable for a completed parent. Reporting an underestimate is preferable to dropping the TTFT entry entirely.

Invariants: INV-5 (causality: TTFT ≤ E2E) is structurally preserved — TTFT covers arrival → first_decode_token while E2E covers arrival → last_decode_token. No invariant violations.

Issues Found

Important

1. BC-3 (TTFTSum consistency) is not directly tested [sim/cluster/disaggregation_test.go]

The new test verifies BC-1/BC-2/BC-4 but not BC-3. After mustRun, the invariant m.TTFTSum == sum(RequestTTFTs values) is not asserted for the PD disaggregation path. A future refactor that adjusts RequestTTFTs without touching TTFTSum (or vice versa) would be undetected. This is the exact pattern guarded by BC-MS-3 in metrics_substrate_test.go for non-PD clusters.

Suggested addition to TestDisaggregation_TTFT_IncludesTransferAndDecode:

// BC-3: TTFTSum must be consistent with RequestTTFTs.
var manualSum float64
for _, v := range m.RequestTTFTs {
    manualSum += v
}
if math.Abs(float64(m.TTFTSum)-manualSum) > 1.0 { // 1 tick tolerance for rounding
    t.Errorf("BC-3: TTFTSum (%d) != sum(RequestTTFTs) (%.1f)", m.TTFTSum, manualSum)
}

2. Defensive fallback path is untested [cluster.go:1200–1203]

The else if hasPrefillTTFT branch (fires when completed == true but TransferCompleteTime == 0 or ITL is empty) has no test. If upstream code regressed to not setting TransferCompleteTime, the TTFT would silently fall back to prefill-only and the test suite would not alert. A targeted test constructing a synthetic ParentRequest with TransferCompleteTime == 0 would close this gap. Fix this →

Suggestions

3. Use cs.ParentRequests() instead of cs.parentRequests in the new test [disaggregation_test.go:936]

// Current (structural — accesses unexported field):
for _, parent := range cs.parentRequests {

// Better (behavioral — uses exported API, follows newer test pattern in the same file):
for _, parent := range cs.ParentRequests() {

The test file is in package cluster so both compile, but newer tests in the same file (lines 1402, 1539, 1624) already use ParentRequests(). Using the internal field bypasses the hasRun guard and sort, and means if parentRequests were ever renamed the test would break silently even though behavior is identical. This is a minor maintainability issue. Fix this →

4. BC-4 check should assert E2E presence unconditionally [disaggregation_test.go:974–978]

// Current: silently skips causality check if E2E absent
e2e, hasE2E := m.RequestE2Es[pid]
if hasE2E && ttft > e2e {

// Better: missing E2E is itself a violation for a completed parent
e2e, hasE2E := m.RequestE2Es[pid]
if !hasE2E {
    t.Errorf("BC-4: parent %s: E2E missing from RequestE2Es for completed parent", pid)
} else if ttft > e2e {
    t.Errorf("BC-4: parent %s: TTFT (%.1f) > E2E (%.1f), causality violated", pid, ttft, e2e)
}

If projectPDMetrics ever failed to project E2E for a completed parent, the causality check would silently pass. Fix this →

Separation of Concerns / Architecture

No concerns. The change is confined to projectPDMetrics(), a post-simulation metrics projection function. It correctly reads from the ParentRequest lifecycle struct (which owns the transfer timestamps) rather than threading new parameters. The module boundary between the cluster coordinator and the instance-level metrics is respected.

Performance

No concerns. The hotpath is unchanged. projectPDMetrics() is called once per simulation run at teardown, iterating over O(parentRequests) entries — the two extra float64 additions are negligible.

Documentation

The inline comment at cluster.go:1181–1187 is accurate and well-written. It correctly describes the llm-d pipeline semantics and cites issue #930. The PR description's behavioral contracts map cleanly to the implementation.

namasl · 2026-04-02T20:57:30Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-02T21:02:54Z

Claude finished @namasl's task in 4m 10s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context
Read implementation changes (cluster.go)
Read test changes (disaggregation_test.go)
Read standards, invariants, principles docs
Deep correctness and invariant analysis
Aggregate findings and post final review

Verdict: Approve — one test naming issue to fix

The fix is correct, well-scoped, and addresses issue #930 accurately. The previous round of review feedback (BC-3 test, BC-4 E2E assertion, ParentRequests() API usage) has all been addressed in the second commit. One remaining issue: the fallback test is misnamed and doesn't exercise the fallback branch.

Correctness

Does this PR tackle the parent issue correctly? Yes.

The root cause was that RequestTTFTs[parentID] was set to prefill_arrival → prefill_done only. In llm-d PD disaggregation, the first user-visible token comes from the decode pod. The correct formula:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

The implementation at cluster.go:1193–1199 is exact.

BC-3 (TTFTSum consistency): m.TTFTSum += int64(newTTFT - prefillTTFT) is correct. aggregateMetrics() already summed in the per-instance TTFTSum (which includes each prefill request's FirstTokenTime). The delta int64(transferDuration + firstDecodeStep) is the exact additive correction. This mirrors the identical pattern at cluster.go:1130–1132 used by the ITL/TTFT reclocking logic, so the pattern is consistent.

R1 (no silent data loss): Satisfied. prefillTTFT is read at line 1189 before the delete calls at lines 1190–1191. The old code had the read interleaved inside the if completed gate after the deletes — this was a latent R1 violation fixed here.

R6 (no logrus.Fatal in sim/): Clean. The new warning uses logrus.Warnf, not Fatalf.

INV-PD-6 (metric map parent granularity): Maintained. Both pfx and dec keys are unconditionally deleted; parent key added only for completed == true.

INV-5 (causality: TTFT ≤ E2E): Structurally preserved. TTFT covers arrival → first_decode_token; E2E covers arrival → last_decode_token. BC-4 is now tested with a hard failure (not a silent skip) if E2E is missing.

INV-PD-4 (phase causality): TransferCompleteTime > 0 guard ensures transfer_start <= transfer_complete (guaranteed by INV-PD-4); no underflow risk.

int64 truncation: int64(newTTFT - prefillTTFT) — since both transferDuration and firstDecodeStep are computed from integer-valued int64 timestamps cast to float64, the difference is always exact. No precision loss.

Issues Found

Important

TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing doesn't test the fallback branch [disaggregation_test.go:999–1022]

The test name promises to verify the else if hasPrefillTTFT fallback (lines 1200–1203 of cluster.go) — the path that fires when a completed parent has TransferCompleteTime == 0 or empty DecodeSubReq.ITL. But the test body runs a normal well-formed simulation and checks that completed parents have non-missing TTFT entries. The comment inside even states: "the fallback branch should never fire in a well-formed simulation."

The test is useful — it verifies R1 (no silent data loss) for the primary path — but it doesn't exercise the fallback code path at all. There are two options:

Option A (rename — minimal change): Rename to reflect what it actually tests:

// TestDisaggregation_TTFT_NoSilentDrops verifies R1: all completed PD parents
// have a TTFT entry after projection, regardless of which branch fired.
func TestDisaggregation_TTFT_NoSilentDrops(t *testing.T) {

Option B (inject fallback directly — stronger): Since the fallback cannot be triggered via a well-formed cluster simulation, test it by calling projectPDMetrics on a RawMetrics snapshot constructed with a parent whose TransferCompleteTime == 0:

func TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing(t *testing.T) {
    // Build a minimal merged metrics map with a prefill TTFT but no transfer data.
    parent := &ParentRequest{
        ID: "p1", PrefillSubReqID: "p1_prefill", DecodeSubReqID: "p1_decode",
        CompletionTime: 1000, ArrivalTime: 0,
        DecodeInstanceID: "inst-0",
        TransferStartTime: 0, TransferCompleteTime: 0, // missing transfer data
    }
    m := &sim.RawMetrics{...}
    m.RequestTTFTs["p1_prefill"] = 500.0
    // call projectPDMetrics with this parent...
    // assert RequestTTFTs["p1"] == 500.0 (fallback value)
}

Option A is the minimum fix to close the misleading name. Option B gives actual coverage of the branch. Fix this →

Standards Compliance Checklist (R1–R23)

Rule	Status	Notes
R1 (no silent data loss)	✅	Delete moved after read of `prefillTTFT`; fallback uses `Warnf` not silent drop
R2 (sort map keys before float accumulation)	⚠️	BC-3 test accumulates `RequestTTFTs` map without sorting. Functionally harmless (integer-valued floats, test-internal accumulation, not simulation output) but technically violates R2 letter
R4 (construction site audit)	✅	No new struct construction sites
R6 (no logrus.Fatal in sim/)	✅	Uses `logrus.Warnf` correctly
R7 (invariant tests alongside golden tests)	✅	Two invariant tests added
R14 (no multi-module methods)	✅	`projectPDMetrics` only touches metrics projection
Others	✅	Not applicable to this change

Behavioral Tests Assessment

What's covered:

BC-1 (exact formula) — tested with math.Abs tolerance
BC-2 (TTFT > prefill-only) — tested
BC-3 (TTFTSum consistency) — tested via map sum comparison
BC-4 (TTFT ≤ E2E) — tested, now fails hard on missing E2E (fixed from prior review)
BC-5 (non-PD clusters unaffected) — implicitly covered by existing TestDisaggregation_MetricProjection_NoOp

What's not covered:

else if hasPrefillTTFT fallback branch — not exercised (see issue above)
else branch (no prefill TTFT at all for a completed parent) — not exercised; this is likely unreachable in practice

Both uncovered branches are edge/error paths, not the happy path — the core fix is solid.

Separation of Concerns / Architecture

No concerns. The change is entirely within projectPDMetrics() — a post-simulation metrics projection step. It correctly reads from ParentRequest lifecycle fields (TransferStartTime, TransferCompleteTime, DecodeSubReq.ITL) which are the authoritative source for transfer and decode timing, rather than threading new parameters. Module boundary between the cluster coordinator and instance-level metrics is respected.

Performance

No concerns. projectPDMetrics() is called once at simulation teardown. The change adds two float64 additions and one int64 cast per parent request — negligible compared to event processing.

Documentation

The inline comment at cluster.go:1181–1187 is accurate and well-written. It correctly describes the llm-d pipeline semantics (prefill → KV transfer → decode → first user token), cites issue #930, and explains the R1 guard reasoning. PR behavioral contracts (BC-1 through BC-5) map cleanly to the implementation.

Summary: Fix is correct and complete. The second commit addressed all prior review feedback. The one remaining action item is fixing the fallback test name (Option A, one-line change) or adding actual fallback coverage (Option B).

namasl · 2026-04-02T21:10:17Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-02T21:11:26Z

Claude finished @namasl's task in 5m 28s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context and changed files
Read implementation changes (cluster.go)
Read test changes (disaggregation_test.go)
Read standards docs (rules.md, invariants.md, principles.md)
Deep correctness and invariant analysis
Aggregate findings and post final review

Verdict: Approve — two minor hygiene items

This is the third review pass. The fix is correct and the tests are now solid. Both substantive issues from round 2 have been resolved:

BC-3 TTFTSum consistency is tested
TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing now directly exercises the fallback branch via synthetic ClusterSimulator construction rather than a well-formed simulation

Two small hygiene items remain.

Correctness

Does this PR tackle issue #930 correctly? Yes.

The root cause was that RequestTTFTs[parentID] captured arrival → prefill_done only. In llm-d PD disaggregation, the first user-visible token is produced by the decode pod. The formula:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

is implemented exactly at cluster.go:1193–1199.

TTFTSum adjustment (cluster.go:1199): m.TTFTSum += int64(newTTFT - prefillTTFT). Both transferDuration and firstDecodeStep are float64-cast from int64 timestamps, so the difference is always an exact integer-valued float. int64(...) is lossless. The pattern mirrors cluster.go:1132 (warm-up TTFT adjustment). Correct. ✓

R1 compliance: prefillTTFT is read at line 1189 before the delete calls at lines 1190–1191. Prior code had the read inside the if completed gate after the deletes — a latent R1 violation, now fixed. ✓

R6 compliance: New warning path uses logrus.Warnf, not Fatalf. ✓

INV-5 (causality TTFT ≤ E2E): Structurally preserved — TTFT covers arrival → first_decode_token, E2E covers arrival → last_decode_token. ✓

INV-6 (determinism): The outer loop for _, parent := range c.parentRequests iterates an unsorted map, but the operations are either independent map mutations (unique keys per parent — no conflicts) or int64 accumulation (associative/commutative — no floating-point non-determinism). INV-6 is not violated. ✓

TTFT distribution output: Computed from sorted RequestTTFTs map values via mapValues→sortedKeys at metrics.go:139–140. Not from TTFTSum. TTFTSum is an internal consistency check field only. ✓

Standards Compliance (R1–R23)

Rule	Status	Notes
R1 (no silent data loss)	✅	Read before delete; fallback warns not silently drops
R2 (sort map keys before float accumulation)	⚠️	BC-3 test accumulates `RequestTTFTs` unsorted (see Issue 1 below)
R4 (construction site audit)	✅	No new struct fields; `ParentRequest` struct uses existing fields
R6 (no logrus.Fatal in sim/)	✅	Uses `logrus.Warnf` correctly
R7 (invariant tests alongside golden tests)	✅	Three invariant tests added
R14 (no multi-module methods)	✅	`projectPDMetrics` is a metrics projection step only
R15 (stale PR references)	✅	`#930` references are live issue citations, not stale todos
Others	✅	Not applicable to this change

Issues Found

Hygiene

1. BC-3 test accumulates RequestTTFTs without sorted keys [disaggregation_test.go:984]

var manualSum float64
for _, v := range m.RequestTTFTs {  // ← unsorted map iteration (R2)
    manualSum += v
}

Technically violates R2 (unsorted map iteration feeding a float sum). In practice, all values are integer-valued float64s (derived from int64 timestamps), so float addition order doesn't matter and the result is deterministic. The previous round 2 review flagged this as "⚠️ Functionally harmless." Still present. A one-line fix:

for _, k := range sortedKeys(m.RequestTTFTs) {
    manualSum += m.RequestTTFTs[k]
}

Not blocking, but worth fixing for pattern consistency since the project's standards take R2 seriously. Fix this →

2. Decode sub-request key assertion in fallback test is vacuously true [disaggregation_test.go:1086–1089]

if _, exists := m.RequestTTFTs[tc.parent.DecodeSubReqID]; exists {
    t.Errorf("INV-PD-6: decode sub-request key %s still present", tc.parent.DecodeSubReqID)
}

The fallback test pre-populates only m.RequestTTFTs[tc.parent.PrefillSubReqID]; the decode key is never set. So this assertion is always trivially true (the key never existed). The assertion for the prefill key deletion is the meaningful one. Not a bug, just a weak assertion. Fix this →

Behavioral Tests Coverage

Contract	Test	Status
BC-1 (exact formula)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-2 (TTFT > prefill-only)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-3 (TTFTSum consistency)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-4 (TTFT ≤ E2E, hard failure on missing E2E)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-5 (non-PD clusters unaffected)	`TestDisaggregation_MetricProjection_NoOp` (pre-existing)	✅
R1 (no silent TTFT drops)	`TestDisaggregation_TTFT_NoSilentDrops`	✅
Fallback (TransferCompleteTime=0)	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback (empty ITL)	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback (nil DecodeSubReq)	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
INV-PD-6 (sub-req keys deleted in fallback)	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅ (prefill deletion verified; decode deletion trivially true — see Issue 2)

Separation of Concerns / Architecture

No concerns. Change is entirely within projectPDMetrics() — a post-simulation metrics projection step. Reads from ParentRequest lifecycle timestamps (the authoritative source for transfer/decode timing) without threading new parameters through the execution path. Module boundary respected. ✓

Performance

No concerns. projectPDMetrics() is called once at teardown; two float64 additions and one int64 cast per parent — negligible. ✓

Documentation

Inline comment at cluster.go:1181–1187 accurately describes the llm-d pipeline (prefill → KV transfer → decode → first user token) and cites issue #930. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. ✓

Summary: Fix is correct and complete. Both issues from round 2 are resolved. The two remaining items (R2 in BC-3 test, vacuous decode-key assertion in fallback test) are hygiene-level and non-blocking.

namasl · 2026-04-02T21:19:38Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-02T21:20:03Z

Claude finished @namasl's task in 10s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context (round 4 review)
Read implementation changes (cluster.go, pd_events.go, pd_metrics.go)
Read test changes (disaggregation_test.go)
Read standards docs
Run specialized review agents (parallel)
Aggregate findings and post final review

Verdict: Approve — one documentation fix needed before merge

This is the fourth review pass. The fix is correct and the two round 3 hygiene items are addressed. One documentation accuracy issue discovered in a file outside the diff that this PR leaves stale.

Correctness

Does this PR tackle issue #930 correctly? Yes.

The formula TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0] is implemented exactly at cluster.go:1193–1199. All three components are sequential in the llm-d PD pipeline and additive.

R1: prefillTTFT is read at line 1189 before the delete calls at 1190–1191. Read-before-delete ordering is now explicit and documented with an R1 comment. ✓

R6: Both warning paths use logrus.Warnf, not Fatalf. ✓

int64 truncation: int64(newTTFT - prefillTTFT) = transferDuration + firstDecodeStep. Both are float64(int64) casts — the sum of two integer-valued float64s is exact. int64(...) is lossless. ✓

INV-5 (causality): newTTFT ≤ E2E holds structurally — TTFT covers arrival → first_decode_token, E2E covers arrival → last_decode_token, and the prefill sub-request uses the parent's ArrivalTime so the origin is identical. ✓

INV-6 (determinism): The outer loop iterates c.parentRequests (a map), but TTFTSum += delta is integer addition — commutative and associative. Final TTFTSum is deterministic regardless of iteration order. ✓

BC-3 (TTFTSum consistency): aggregateMetrics() sums all instance-level TTFTSum values, which include each prefill sub-request's FirstTokenTime as the base. m.TTFTSum += int64(newTTFT - prefillTTFT) adds the exact delta (transfer + decode). No double-counting. In the fallback path (else if hasPrefillTTFT), the TTFT value is unchanged so no adjustment is needed — TTFTSum already contains the prefill value. ✓

INV-PD-4 (transfer start ≤ complete): Verified in pd_events.go:291–294 — a defensive clamp-to-zero exists in the trace path. The guard parent.TransferCompleteTime > 0 in cluster.go:1193 is sufficient in practice since TransferCompleteTime is set from e.time (always positive) in pd_events.go:185, and TransferStartTime is always set before it at pd_events.go:106. See note below about guard consistency.

Issues Found

Important (fix before merge)

Stale comment in pd_metrics.go:113–116 — this file is not in the diff, but the PR changes its documented semantics.

// pd_metrics.go:113–116 (CURRENT — now incorrect)
// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the prefill sub-request TTFT has been rekeyed
// from PrefillSubReqID to the parent ID. The value is unchanged: first_token_time -
// arrival_time, recorded when ProgressIndex == inputLen (prefill completion).

"The value is unchanged" is now wrong. After this PR, the projected value is prefillTTFT + transferDuration + firstDecodeStep, not the raw prefill TTFT. The phrase "prefill completion" is also misleading — it implies the value only covers the prefill phase, which is no longer true. Any future reader of pd_metrics.go relying on this comment will have a wrong mental model.

Suggested replacement:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the user-visible TTFT is keyed by the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).
// Missing key returns 0.0; exclude 0.0 values (BC-11).

Fix this →

Hygiene

Guard inconsistency between cluster.go and pd_metrics.go [cluster.go:1193]

cluster.go guards with parent.TransferCompleteTime > 0. pd_metrics.go:124 uses the stronger guard p.TransferStartTime > 0 && p.TransferCompleteTime >= p.TransferStartTime. The difference is only material if TransferCompleteTime < TransferStartTime (an INV-PD-4 violation), which pd_events.go:292–294 logs but does NOT correct on the parent struct. If INV-PD-4 is violated, cluster.go would compute a negative transferDuration, potentially producing a TTFT less than prefillTTFT, violating BC-2. Aligning to the stronger guard (matching pd_metrics.go) costs one comparison and eliminates this class of inconsistency:

// cluster.go:1193 — current
if hasPrefillTTFT && parent.TransferCompleteTime > 0 && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

// Suggested — aligned with pd_metrics.go:124
if hasPrefillTTFT && parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

Fix this →

Standards Compliance (R1–R23)

Rule	Status	Notes
R1 (no silent data loss)	✅	Read before delete, fallback warns not silently drops
R2 (sort map keys before float accumulation)	✅	`TTFTSum` delta addition is commutative — no sort needed; BC-3 test uses `sortedKeys`
R4 (construction site audit)	✅	No new struct fields added
R6 (no logrus.Fatal in sim/)	✅	Both new paths use `logrus.Warnf`
R7 (invariant tests alongside golden tests)	✅	Three new invariant tests
R14 (no multi-module methods)	✅	Change is self-contained in `projectPDMetrics`
R15 (stale PR references)	⚠️	`pd_metrics.go:116` comment is stale after this PR's changes (see issue above)

Behavioral Test Coverage

Contract	Test	Status
BC-1 (exact formula)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-2 (TTFT > prefill-only)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-3 (TTFTSum consistency)	`TestDisaggregation_TTFT_IncludesTransferAndDecode` via `sortedKeys`	✅
BC-4 (TTFT ≤ E2E, hard fail on missing E2E)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-5 (non-PD clusters unaffected)	Pre-existing `TestDisaggregation_MetricProjection_NoOp`	✅
R1 (no silent drops)	`TestDisaggregation_TTFT_NoSilentDrops`	✅
Fallback: TransferCompleteTime=0	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: empty ITL	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: nil DecodeSubReq	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
INV-PD-6 (sub-req keys deleted)	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅

Tests are behavioral (observable post-conditions via public APIs), meet the refactor survival criterion, and cover all branches including the defensive fallback. The fallback test correctly exercises the unreachable-from-well-formed-sim branch via synthetic struct injection. No critical gaps.

Separation of Concerns / Architecture

No concerns. The change is confined to projectPDMetrics(), a post-simulation metrics projection step. It reads from ParentRequest lifecycle timestamps (the authoritative source) rather than threading new parameters. Module boundaries respected. ✓

Performance

No concerns. projectPDMetrics() is called once at teardown. Two float64 additions and one int64 cast per parent — negligible. ✓

Documentation (PR-level)

Inline comment at cluster.go:1181–1187 is accurate and well-written: describes the llm-d pipeline semantics, cites issue #930, and explains the R1 guard. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. The one exception is pd_metrics.go:113–116 which is now stale (see important issue above).

Summary: Fix is correct and test coverage is comprehensive. The only action item before merge is updating the stale comment in pd_metrics.go:113–116. The guard inconsistency is a hygiene note.

namasl · 2026-04-02T21:29:47Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-02T21:30:15Z

Claude finished @namasl's task in 8m 21s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context (round 5 — addressing round 4 advisory items)
Read implementation (cluster.go)
Read test changes (disaggregation_test.go)
Read standards docs (rules.md)
Run specialized review agents (code, tests, errors)
Aggregate findings and post final review

Verdict: Approve — two advisory items

This is round 5. All prior round feedback (R2 sortedKeys, BC-2 non-triviality guards, decode-key pre-population in fallback test, BC-4 hard failure on missing E2E) has been addressed correctly. The implementation is correct and the test suite is solid. Two advisory items remain, neither blocking.

Correctness

Does this PR tackle issue #930 correctly? Yes. The formula prefillTTFT + transferDuration + firstDecodeStep is computed exactly at cluster.go:1193–1199. The TTFTSum adjustment in the fallback path (else if hasPrefillTTFT) is correctly zero — the prefill TTFT was already included in TTFTSum by instance-level aggregation; the map rekey preserves the sum without additional delta.

R1/R2/R6/R7/R14: All pass. See prior rounds; no regressions introduced.

Advisory Items

1. Guard inconsistency: `cluster.go:1193` uses a weaker predicate than the established pattern in `pd_metrics.go:124`

cluster.go:1193:

if hasPrefillTTFT && parent.TransferCompleteTime > 0 && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

pd_metrics.go:124 (established pattern):

if p.TransferStartTime > 0 && p.TransferCompleteTime >= p.TransferStartTime

The difference: the cluster.go guard does not check TransferStartTime > 0 or TransferCompleteTime >= TransferStartTime. If INV-PD-4 were ever violated (TransferCompleteTime recorded without a matching KVTransferStartedEvent, or a priority inversion bug leaving TransferStartTime=0), transferDuration could be inflated or negative, silently corrupting TTFTSum at line 1199 (int64(newTTFT - prefillTTFT) would be a large negative or inflated number with no log). The existing event-ordering guarantee (INV-3 + causal scheduling) makes this unreachable today, but the divergence from the established guard pattern leaves a latent hazard.

Suggested fix: Align with pd_metrics.go:124:

if hasPrefillTTFT && parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

Fix this →

2. Fallback test has no case for `completed && !hasPrefillTTFT` (Branch C, `cluster.go:1204`)

The else branch at line 1204 fires when a completed parent has no prefill TTFT entry at all. All three sub-cases in TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing pre-populate m.RequestTTFTs[tc.parent.PrefillSubReqID], so they all land in the else if hasPrefillTTFT branch. Branch C is not exercised by any test. TestDisaggregation_TTFT_NoSilentDrops would catch a regression in an integration run, but Branch C is never deliberately triggered.

Suggested addition — a fourth sub-case to the table-driven fallback test:

{
    name: "no prefill TTFT key",
    parent: &ParentRequest{
        ID: "p4", PrefillSubReqID: "p4_prefill", DecodeSubReqID: "p4_decode",
        OriginalRequest: origReq,
        ArrivalTime: 0, CompletionTime: 5000, DecodeInstanceID: "inst-0",
        TransferStartTime: 100, TransferCompleteTime: 200,
        DecodeSubReq: &sim.Request{ITL: []int64{100}},
    },
    // intentionally do NOT pre-populate m.RequestTTFTs["p4_prefill"]
},

The assertion: m.RequestTTFTs["p4"] must be absent (correct omission, not silent zero insert), and sub-request keys must also be absent (INV-PD-6). Fix this →

Standards Compliance (R1–R23)

Rule	Status	Notes
R1 (no silent data loss)	✅	Read before delete; all branches warn or produce valid output
R2 (sort map keys before float accumulation)	✅	BC-3 test uses `sortedKeys(m.RequestTTFTs)`
R6 (no logrus.Fatal in sim/)	✅	All new log calls use `Warnf`
R7 (invariant tests alongside golden tests)	✅	Three new invariant tests
R14 (no multi-module methods)	✅	`projectPDMetrics` single-concern
Others	✅	No new config, interfaces, or YAML

Behavioral Tests Coverage

Contract	Test	Status
BC-1 (exact formula)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-2 (TTFT > prefill-only, with non-triviality guards)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-3 (TTFTSum consistency, sortedKeys)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-4 (TTFT ≤ E2E, hard failure on missing E2E)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-5 (non-PD unaffected)	`TestDisaggregation_MetricProjection_NoOp` (pre-existing)	✅
R1 (no silent drops)	`TestDisaggregation_TTFT_NoSilentDrops`	✅
Fallback: TransferCompleteTime=0	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: empty ITL	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: nil DecodeSubReq	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: no prefill TTFT key (Branch C)	—	⚠️ uncovered (advisory)

Performance / Architecture / Documentation

No concerns. projectPDMetrics() is a teardown-only projection step. Module boundaries respected. Inline comment at cluster.go:1181–1187 is accurate. PR behavioral contracts map cleanly to the implementation.

Summary: Fix is correct and complete. All prior round feedback has been addressed. The two remaining advisory items (guard consistency and Branch C coverage) are non-blocking hygiene. The PR is ready to merge.

namasl · 2026-04-03T00:34:42Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-03T00:40:00Z

Claude finished @namasl's task in 2m 39s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context (round 6)
Read implementation (cluster.go, pd_metrics.go)
Read test changes (disaggregation_test.go)
Analyze correctness, standards compliance, and test coverage
Post final review

Verdict: Approve — one stale comment from round 4 still needs fixing

Round 5's two advisory items (guard inconsistency, Branch C coverage) have been addressed correctly. The only remaining item is a stale comment in pd_metrics.go:113–116 that was flagged as "Important (fix before merge)" in round 4 but was not resolved. The file has no diff in this PR.

What was fixed since round 5

Advisory 1 (guard inconsistency) — FIXED ✅ [cluster.go:1193]

The guard now matches the stronger predicate established in pd_metrics.go:124:

if hasPrefillTTFT && parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

If TransferCompleteTime < TransferStartTime (an INV-PD-4 violation), transferDuration cannot go negative. The alignment eliminates guard drift between the two projection functions.

Advisory 2 (Branch C coverage) — FIXED ✅ [disaggregation_test.go, 4th sub-case]

The skipPrefillTTFT: true case correctly exercises Branch C:

m.RequestTTFTs["p4_prefill"] is not set → hasPrefillTTFT = false
m.RequestTTFTs["p4_decode"] = 999.0 is pre-populated (non-trivial INV-PD-6 assertion)
Transfer data and ITL are valid, so the case would reach the happy path guard if hasPrefillTTFT were true — the only reason it reaches Branch C is the missing prefill key
Asserts: no parent TTFT entry created, both sub-request keys deleted

Outstanding Issue

Important (fix before merge)

Stale comment in pd_metrics.go:113–116 — first flagged in round 4, not addressed

pd_metrics.go has no diff in this PR. The comment still reads:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the prefill sub-request TTFT has been rekeyed
// from PrefillSubReqID to the parent ID. The value is unchanged: first_token_time -
// arrival_time, recorded when ProgressIndex == inputLen (prefill completion).
// Missing key returns 0.0 in Go maps; exclude 0.0 values (BC-11).

"The value is unchanged" and "prefill completion" are both incorrect after this PR. The projected value now includes KV transfer duration and the first decode step. Any future reader of pd_metrics.go will have a wrong mental model about what aggregated.RequestTTFTs[p.ID] contains at this call site.

Suggested replacement:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the user-visible TTFT is stored under the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).
// Missing key returns 0.0 in Go maps; exclude 0.0 values (BC-11).

Fix this →

Correctness Summary

The fix is correct and unchanged from prior rounds:

Formula prefillTTFT + transferDuration + firstDecodeStep is exact at cluster.go:1193–1199
TTFTSum adjustment (int64(newTTFT - prefillTTFT)) is lossless — both addends are integer-valued float64
R1 (read before delete), R6 (Warnf not Fatalf), INV-5 (causality), INV-6 (determinism): all pass

Test Coverage Summary

Contract	Test	Status
BC-1 (exact formula)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-2 (TTFT > prefill-only, non-triviality guards)	same	✅
BC-3 (TTFTSum consistency, sortedKeys)	same	✅
BC-4 (TTFT ≤ E2E, hard fail on missing E2E)	same	✅
BC-5 (non-PD clusters unaffected)	pre-existing `TestDisaggregation_MetricProjection_NoOp`	✅
R1 (no silent drops)	`TestDisaggregation_TTFT_NoSilentDrops`	✅
Fallback: TransferCompleteTime=0	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: empty ITL	same	✅
Fallback: nil DecodeSubReq	same	✅
Branch C: no prefill TTFT key	same (4th sub-case)	✅
INV-PD-6 (sub-req keys deleted)	same	✅

All branches are exercised. The fallback test correctly pre-populates the decode sub-request key so the INV-PD-6 deletion assertions are non-trivial for all cases.

Summary: Fix is correct, tests are comprehensive, and all round 5 advisory items have been addressed. The only remaining action item is updating the stale comment in pd_metrics.go:113–116, which was marked "Important (fix before merge)" in round 4 and has not been addressed.
| Branch: fix/pd-ttft-user-visible

After PR inference-sim#931 the projected TTFT value includes KV transfer duration and first decode step, not just the rekeyed prefill TTFT. Update the comment at the call site to reflect the corrected semantics (issue inference-sim#930). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

namasl · 2026-04-03T14:02:22Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-03T14:02:46Z

Claude finished @namasl's task in 3m 16s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context (round 7)
Read changed files (cluster.go, pd_metrics.go, disaggregation_test.go)
Verify all prior round advisory items resolved
Final standards compliance check
Aggregate findings and post final review

Verdict: Approve — ready to merge

This is round 7. All prior round feedback has been addressed. The round 6 "Important (fix before merge)" item — the stale comment in pd_metrics.go:113–116 — is now fixed. No new issues found. The PR is clean.

What changed since round 6

Stale comment in pd_metrics.go:113–116 — FIXED ✅

Old (incorrect after this PR):

// After projectPDMetrics(), the prefill sub-request TTFT has been rekeyed
// from PrefillSubReqID to the parent ID. The value is unchanged: first_token_time -
// arrival_time, recorded when ProgressIndex == inputLen (prefill completion).

New (correct):

// After projectPDMetrics(), the user-visible TTFT is stored under the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).

This was the final blocker identified in round 4 and re-flagged in round 6. It's resolved.

Correctness (unchanged from prior rounds, confirmed still holds)

Does this PR tackle issue #930 correctly? Yes.

Formula at cluster.go:1193–1199:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

All three components are sequential and additive in the llm-d PD pipeline.

TTFTSum adjustment (cluster.go:1199): m.TTFTSum += int64(newTTFT - prefillTTFT). Both transferDuration and firstDecodeStep are float64-cast from int64 timestamps — the sum is always an exact integer-valued float64 and int64(...) is lossless. For the fallback path, no adjustment is made (TTFTSum already contains the prefill value from instance-level aggregation). ✓

R1 (read before delete): prefillTTFT read at line 1189, deletes at 1190–1191. ✓

R6 (no logrus.Fatal in sim/): Both warn paths use Warnf. ✓

INV-5 (causality TTFT ≤ E2E): Structurally preserved — TTFT covers arrival → first_decode_token, E2E covers arrival → last_decode_token. ✓

Guard consistency (round 5 advisory #1): cluster.go:1193 now uses the stronger parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime, matching the established pattern in pd_metrics.go:124. If INV-PD-4 is ever violated, transferDuration cannot go negative or be inflated. ✓

Standards Compliance (R1–R23)

Rule	Status	Notes
R1 (no silent data loss)	✅	Read before delete; all branches warn or produce valid output
R2 (sort map keys before float accumulation)	✅	BC-3 test uses `sortedKeys(m.RequestTTFTs)`
R6 (no logrus.Fatal in sim/)	✅	Both new log calls use `Warnf`
R7 (invariant tests alongside golden tests)	✅	Three new invariant tests
R14 (no multi-module methods)	✅	`projectPDMetrics` single-concern
R15 (stale references)	✅	`pd_metrics.go` comment updated
Others	✅	No new config, interfaces, or YAML

Behavioral Test Coverage

Contract	Test	Status
BC-1 (exact formula)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-2 (TTFT > prefill-only, non-triviality guards)	same	✅
BC-3 (TTFTSum consistency, `sortedKeys`)	same	✅
BC-4 (TTFT ≤ E2E, hard fail on missing E2E)	same	✅
BC-5 (non-PD clusters unaffected)	`TestDisaggregation_MetricProjection_NoOp` (pre-existing)	✅
R1 (no silent drops)	`TestDisaggregation_TTFT_NoSilentDrops`	✅
Fallback: TransferCompleteTime=0	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: empty ITL	same	✅
Fallback: nil DecodeSubReq	same	✅
Branch C: no prefill TTFT key	same (4th sub-case, `skipPrefillTTFT: true`)	✅
INV-PD-6 (sub-req keys deleted)	same — decode key pre-populated with 999.0; deletion is non-trivial	✅

All branches are exercised. Tests are behavioral, use exported APIs (cs.ParentRequests()), and satisfy the refactor survival criterion.

Separation of Concerns / Architecture

No concerns. The change is entirely within projectPDMetrics() — a post-simulation metrics projection step, called once at teardown. It reads from ParentRequest lifecycle timestamps (the authoritative source) rather than threading new parameters through the execution path. Module boundary between the cluster coordinator and instance-level metrics is respected. ✓

Performance

No concerns. projectPDMetrics() is called once at simulation teardown. The change adds two float64 additions and one int64 cast per parent request — negligible. ✓

Documentation

cluster.go:1181–1187 accurately describes the llm-d pipeline semantics. pd_metrics.go:113–116 now correctly describes the post-projection value. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. ✓

Summary: The fix is correct, complete, and well-tested. All seven rounds of review feedback have been addressed. No remaining action items.
| Branch: fix/pd-ttft-user-visible

…erence-sim#930) In PD disaggregated mode, user-visible TTFT was understated because it only captured prefill sub-request completion time. In llm-d, the first token reaches the user from the decode pod — after prefill, KV transfer, and one decode step. Update projectPDMetrics() to compute user-visible TTFT as prefillTTFT + transferDuration + firstDecodeStep. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Add BC-3 test: TTFTSum consistency with sum(RequestTTFTs) after projection - Add fallback path test: completed parents always have TTFT entries (R1) - Use cs.ParentRequests() instead of cs.parentRequests (exported API) - Assert E2E presence unconditionally in BC-4 causality check Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The fallback test was renamed/split: TestDisaggregation_TTFT_NoSilentDrops verifies R1 (no dropped entries) via a normal simulation, while TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing now directly exercises the defensive fallback in projectPDMetrics by constructing synthetic ParentRequests with TransferCompleteTime=0, empty ITL, and nil DecodeSubReq. Table-driven with three cases. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- BC-3 test: iterate RequestTTFTs with sortedKeys for R2 compliance - Fallback test: pre-populate decode sub-request key so INV-PD-6 deletion assertion is non-trivial Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- BC-2: add explicit non-triviality guards for transferDuration and firstDecodeStep so a config change zeroing either addend causes a clear test failure rather than a vacuously-passing assertion - BC-3: add comment explaining why tolerance is 1.0 µs (TTFTSum is int64-truncated) rather than 1e-9 used elsewhere in the file Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Align happy-path guard with established pd_metrics.go pattern (TransferStartTime > 0 && TransferCompleteTime >= TransferStartTime) to prevent silent TTFTSum corruption on invariant violation. Add Branch C coverage to FallbackWhenDecodeDataMissing: fourth sub-case exercises the else path where a completed parent has no prefill TTFT key, asserting no entry is written and INV-PD-6 holds. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

After PR inference-sim#931 the projected TTFT value includes KV transfer duration and first decode step, not just the rekeyed prefill TTFT. Update the comment at the call site to reflect the corrected semantics (issue inference-sim#930). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

namasl · 2026-04-03T14:36:49Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-04-03T14:37:12Z

Claude finished @namasl's task in 3m 43s —— View job

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Gather PR context (round 8)
Read changed files (cluster.go, pd_metrics.go, disaggregation_test.go)
Standards compliance and correctness check
Aggregate findings and post final review

Verdict: Approve — ready to merge

This is round 8. The most recent commit (0601db8) addressed the final outstanding item from round 4/6 — the stale comment in pd_metrics.go:113–116. No new issues found. The PR is clean.

What changed since round 7

Stale pd_metrics.go comment — FIXED ✅ (0601db8)

pd_metrics.go:113–116 now correctly reads:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the user-visible TTFT is stored under the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).
// Missing key returns 0.0 in Go maps; exclude 0.0 values (BC-11).

The previously incorrect "The value is unchanged: first_token_time - arrival_time, recorded when ProgressIndex == inputLen (prefill completion)" is gone.

Correctness (confirmed against current code)

Does this PR tackle issue #930 correctly? Yes.

Formula at cluster.go:1209–1215:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

All three components are sequential and additive in the llm-d PD pipeline. This is the first user-visible token path: prefill pod completes → KV transfer → decode pod emits first output token.

Check	Line	Status
R1: read `prefillTTFT` before `delete` calls	1205 (before 1206–1207)	✅
R6: no `logrus.Fatal` in sim/	1219, 1221	✅
Guard aligned with `pd_metrics.go:124`	1209 (`TransferStartTime > 0 && TransferCompleteTime >= TransferStartTime`)	✅
`TTFTSum` delta is lossless int64	1215 (`int64(newTTFT - prefillTTFT)`)	✅
INV-5 causality: TTFT ≤ E2E	structural	✅
INV-6 determinism	TTFTSum delta is integer addition (commutative)	✅
`pd_metrics.go` comment accurate	113–117	✅

Standards Compliance (R1–R23)

Rule	Status	Notes
R1 (no silent data loss)	✅	Read before delete; all branches warn or produce valid output
R2 (sort map keys before float accumulation)	✅	BC-3 test uses `sortedKeys(m.RequestTTFTs)` (line 1001)
R6 (no logrus.Fatal in sim/)	✅	Both log calls use `Warnf`
R7 (invariant tests alongside golden tests)	✅	Three new invariant tests
R14 (no multi-module methods)	✅	Change confined to `projectPDMetrics()`
R15 (no stale references)	✅	`pd_metrics.go` comment updated

Behavioral Test Coverage

Contract	Test	Status
BC-1 (exact formula)	`TestDisaggregation_TTFT_IncludesTransferAndDecode`	✅
BC-2 (TTFT > prefill-only, non-triviality guards)	same	✅
BC-3 (TTFTSum consistency, `sortedKeys`)	same	✅
BC-4 (TTFT ≤ E2E, hard failure on missing E2E)	same	✅
BC-5 (non-PD clusters unaffected)	`TestDisaggregation_MetricProjection_NoOp` (pre-existing)	✅
R1 (no silent drops)	`TestDisaggregation_TTFT_NoSilentDrops`	✅
Fallback: `TransferCompleteTime=0`	`TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing`	✅
Fallback: empty `ITL`	same	✅
Fallback: nil `DecodeSubReq`	same	✅
Branch C: no prefill TTFT key	same (`skipPrefillTTFT: true`, 4th sub-case)	✅
INV-PD-6 (sub-req keys deleted)	same — decode key pre-populated with 999.0 (non-trivial)	✅

Tests use cs.ParentRequests() (exported API, lines 936 and 1019). The fallback test directly injects synthetic state via the unexported struct fields — unavoidable for exercising an unreachable-from-normal-simulation branch, and acceptable in package cluster same-package tests. Refactor survival criterion holds for the integration tests; the synthetic-injection fallback test is structurally coupled by design.

Separation of Concerns / Architecture

No concerns. The change is entirely within projectPDMetrics() — a post-simulation metrics projection step called once at teardown. Transfer timestamps are read from ParentRequest lifecycle fields (the authoritative source), not threaded as new parameters. Module boundary between the cluster coordinator and instance-level metrics is respected. ✓

Performance

No concerns. Two float64 additions and one int64 cast per parent request at teardown. ✓

Documentation

cluster.go:1197–1203 accurately describes the llm-d pipeline semantics and cites issue #930. pd_metrics.go:113–117 now correctly describes the post-projection value. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. ✓

Summary: Fix is correct, complete, and well-tested. All eight rounds of review feedback have been addressed across seven commits. No remaining action items. The PR is ready to merge.
| Branch: fix/pd-ttft-user-visible

namasl and others added 7 commits April 3, 2026 14:35

namasl force-pushed the fix/pd-ttft-user-visible branch from c47eb95 to 0601db8 Compare April 3, 2026 14:36

Conversation

namasl commented Apr 2, 2026

Summary

Behavioral Contracts

Test plan

Uh oh!

namasl commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Verdict: Approve with suggestions

Correctness Assessment

Issues Found

Important

Suggestions

Separation of Concerns / Architecture

Performance

Documentation

Uh oh!

namasl commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Verdict: Approve — one test naming issue to fix

Correctness

Issues Found

Important

Standards Compliance Checklist (R1–R23)

Behavioral Tests Assessment

Separation of Concerns / Architecture

Performance

Documentation

Uh oh!

namasl commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Verdict: Approve — two minor hygiene items

Correctness

Standards Compliance (R1–R23)

Issues Found

Hygiene

Behavioral Tests Coverage

Separation of Concerns / Architecture

Performance

Documentation

Uh oh!

namasl commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Verdict: Approve — one documentation fix needed before merge

Correctness

Issues Found

Important (fix before merge)

Hygiene

Standards Compliance (R1–R23)

Behavioral Test Coverage

Separation of Concerns / Architecture

Performance

Documentation (PR-level)

Uh oh!

namasl commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

Verdict: Approve — two advisory items

Correctness

Advisory Items

1. Guard inconsistency: cluster.go:1193 uses a weaker predicate than the established pattern in pd_metrics.go:124

claude bot commented Apr 2, 2026 •

edited

Loading

claude bot commented Apr 2, 2026 •

edited

Loading

claude bot commented Apr 2, 2026 •

edited

Loading

claude bot commented Apr 2, 2026 •

edited

Loading

claude bot commented Apr 2, 2026 •

edited

Loading

1. Guard inconsistency: `cluster.go:1193` uses a weaker predicate than the established pattern in `pd_metrics.go:124`

2. Fallback test has no case for `completed && !hasPrefillTTFT` (Branch C, `cluster.go:1204`)

claude bot commented Apr 3, 2026 •

edited

Loading

claude bot commented Apr 3, 2026 •

edited

Loading

claude bot commented Apr 3, 2026 •

edited

Loading