Skip to content

fix(cluster): PD TTFT includes KV transfer and first decode step#931

Open
namasl wants to merge 7 commits intoinference-sim:mainfrom
namasl:fix/pd-ttft-user-visible
Open

fix(cluster): PD TTFT includes KV transfer and first decode step#931
namasl wants to merge 7 commits intoinference-sim:mainfrom
namasl:fix/pd-ttft-user-visible

Conversation

@namasl
Copy link
Copy Markdown
Contributor

@namasl namasl commented Apr 2, 2026

Summary

  • PD disaggregated TTFT was understated: it only captured prefill sub-request completion time (arrival → prefill done)
  • In llm-d, the first token reaches the user from the decode pod, not prefill: prefill completes → KV transfers → decode pod recomputes last prompt token and samples first output token
  • Updated projectPDMetrics() to compute user-visible TTFT as prefillTTFT + transferDuration + firstDecodeStep
  • TTFTSum adjusted by the delta to maintain consistency (BC-3)
  • Defensive fallback to prefill-only TTFT if decode data is unavailable

Closes #930

Behavioral Contracts

  • BC-1: RequestTTFTs[parentID] = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]
  • BC-2: New TTFT > prefill-only TTFT (transfer + decode add positive time)
  • BC-3: TTFTSum adjusted by delta to stay consistent with RequestTTFTs
  • BC-4: TTFT <= E2E (causality preserved)
  • BC-5: Non-PD clusters unaffected (projectPDMetrics is a no-op when parentRequests is empty)

Test plan

  • TestDisaggregation_TTFT_IncludesTransferAndDecode — verifies BC-1 (exact formula), BC-2 (exceeds prefill-only), BC-4 (causality)
  • Full go test ./... passes
  • go vet ./... clean

🤖 Generated with Claude Code

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 2, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @namasl's task in 13m 12s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context and changed files
  • Read implementation changes (cluster.go)
  • Read test changes (disaggregation_test.go)
  • Run specialized review agents (code, tests, errors)
  • Aggregate findings and post final review

Verdict: Approve with suggestions

The fix is correct and addresses the issue accurately. The implementation is clean, the formula is sound, and the core behavioral contracts are tested. A few test-quality improvements are worth considering before merge.


Correctness Assessment

Does the PR tackle the parent issue correctly? Yes.

The root cause (issue #930) was that RequestTTFTs[parentID] was set to the prefill sub-request TTFT only — capturing arrival → prefill_done. In llm-d PD disaggregation, the first user-visible token comes from the decode pod, not prefill. The correct formula is:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

The implementation at cluster.go:1196 gets this exactly right. The three components are sequential and additive.

TTFTSum adjustment (BC-3): m.TTFTSum += int64(newTTFT - prefillTTFT) is correct. At this point aggregateMetrics() has already summed in the prefill TTFT; the delta transferDuration + firstDecodeStep is the exact additive correction needed. All constituent values are int64-backed so no float truncation occurs.

R1 compliance (no silent data loss): Satisfied. prefillTTFT is read into a local variable at line 1189 before the delete calls at lines 1190–1191 — an explicit improvement over the previous pattern where the read was interleaved inside the if completed gate.

Defensive fallback: Appropriate. The else if hasPrefillTTFT branch at line 1200 uses prefill-only TTFT with a Warnf log when decode data is unavailable for a completed parent. Reporting an underestimate is preferable to dropping the TTFT entry entirely.

Invariants: INV-5 (causality: TTFT ≤ E2E) is structurally preserved — TTFT covers arrival → first_decode_token while E2E covers arrival → last_decode_token. No invariant violations.


Issues Found

Important

1. BC-3 (TTFTSum consistency) is not directly tested [sim/cluster/disaggregation_test.go]

The new test verifies BC-1/BC-2/BC-4 but not BC-3. After mustRun, the invariant m.TTFTSum == sum(RequestTTFTs values) is not asserted for the PD disaggregation path. A future refactor that adjusts RequestTTFTs without touching TTFTSum (or vice versa) would be undetected. This is the exact pattern guarded by BC-MS-3 in metrics_substrate_test.go for non-PD clusters.

Suggested addition to TestDisaggregation_TTFT_IncludesTransferAndDecode:

// BC-3: TTFTSum must be consistent with RequestTTFTs.
var manualSum float64
for _, v := range m.RequestTTFTs {
    manualSum += v
}
if math.Abs(float64(m.TTFTSum)-manualSum) > 1.0 { // 1 tick tolerance for rounding
    t.Errorf("BC-3: TTFTSum (%d) != sum(RequestTTFTs) (%.1f)", m.TTFTSum, manualSum)
}

2. Defensive fallback path is untested [cluster.go:1200–1203]

The else if hasPrefillTTFT branch (fires when completed == true but TransferCompleteTime == 0 or ITL is empty) has no test. If upstream code regressed to not setting TransferCompleteTime, the TTFT would silently fall back to prefill-only and the test suite would not alert. A targeted test constructing a synthetic ParentRequest with TransferCompleteTime == 0 would close this gap. Fix this →

Suggestions

3. Use cs.ParentRequests() instead of cs.parentRequests in the new test [disaggregation_test.go:936]

// Current (structural — accesses unexported field):
for _, parent := range cs.parentRequests {

// Better (behavioral — uses exported API, follows newer test pattern in the same file):
for _, parent := range cs.ParentRequests() {

The test file is in package cluster so both compile, but newer tests in the same file (lines 1402, 1539, 1624) already use ParentRequests(). Using the internal field bypasses the hasRun guard and sort, and means if parentRequests were ever renamed the test would break silently even though behavior is identical. This is a minor maintainability issue. Fix this →

4. BC-4 check should assert E2E presence unconditionally [disaggregation_test.go:974–978]

// Current: silently skips causality check if E2E absent
e2e, hasE2E := m.RequestE2Es[pid]
if hasE2E && ttft > e2e {

// Better: missing E2E is itself a violation for a completed parent
e2e, hasE2E := m.RequestE2Es[pid]
if !hasE2E {
    t.Errorf("BC-4: parent %s: E2E missing from RequestE2Es for completed parent", pid)
} else if ttft > e2e {
    t.Errorf("BC-4: parent %s: TTFT (%.1f) > E2E (%.1f), causality violated", pid, ttft, e2e)
}

If projectPDMetrics ever failed to project E2E for a completed parent, the causality check would silently pass. Fix this →


Separation of Concerns / Architecture

No concerns. The change is confined to projectPDMetrics(), a post-simulation metrics projection function. It correctly reads from the ParentRequest lifecycle struct (which owns the transfer timestamps) rather than threading new parameters. The module boundary between the cluster coordinator and the instance-level metrics is respected.

Performance

No concerns. The hotpath is unchanged. projectPDMetrics() is called once per simulation run at teardown, iterating over O(parentRequests) entries — the two extra float64 additions are negligible.

Documentation

The inline comment at cluster.go:1181–1187 is accurate and well-written. It correctly describes the llm-d pipeline semantics and cites issue #930. The PR description's behavioral contracts map cleanly to the implementation.


@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 2, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @namasl's task in 4m 10s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context
  • Read implementation changes (cluster.go)
  • Read test changes (disaggregation_test.go)
  • Read standards, invariants, principles docs
  • Deep correctness and invariant analysis
  • Aggregate findings and post final review

Verdict: Approve — one test naming issue to fix

The fix is correct, well-scoped, and addresses issue #930 accurately. The previous round of review feedback (BC-3 test, BC-4 E2E assertion, ParentRequests() API usage) has all been addressed in the second commit. One remaining issue: the fallback test is misnamed and doesn't exercise the fallback branch.


Correctness

Does this PR tackle the parent issue correctly? Yes.

The root cause was that RequestTTFTs[parentID] was set to prefill_arrival → prefill_done only. In llm-d PD disaggregation, the first user-visible token comes from the decode pod. The correct formula:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

The implementation at cluster.go:1193–1199 is exact.

BC-3 (TTFTSum consistency): m.TTFTSum += int64(newTTFT - prefillTTFT) is correct. aggregateMetrics() already summed in the per-instance TTFTSum (which includes each prefill request's FirstTokenTime). The delta int64(transferDuration + firstDecodeStep) is the exact additive correction. This mirrors the identical pattern at cluster.go:1130–1132 used by the ITL/TTFT reclocking logic, so the pattern is consistent.

R1 (no silent data loss): Satisfied. prefillTTFT is read at line 1189 before the delete calls at lines 1190–1191. The old code had the read interleaved inside the if completed gate after the deletes — this was a latent R1 violation fixed here.

R6 (no logrus.Fatal in sim/): Clean. The new warning uses logrus.Warnf, not Fatalf.

INV-PD-6 (metric map parent granularity): Maintained. Both pfx and dec keys are unconditionally deleted; parent key added only for completed == true.

INV-5 (causality: TTFT ≤ E2E): Structurally preserved. TTFT covers arrival → first_decode_token; E2E covers arrival → last_decode_token. BC-4 is now tested with a hard failure (not a silent skip) if E2E is missing.

INV-PD-4 (phase causality): TransferCompleteTime > 0 guard ensures transfer_start <= transfer_complete (guaranteed by INV-PD-4); no underflow risk.

int64 truncation: int64(newTTFT - prefillTTFT) — since both transferDuration and firstDecodeStep are computed from integer-valued int64 timestamps cast to float64, the difference is always exact. No precision loss.


Issues Found

Important

TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing doesn't test the fallback branch [disaggregation_test.go:999–1022]

The test name promises to verify the else if hasPrefillTTFT fallback (lines 1200–1203 of cluster.go) — the path that fires when a completed parent has TransferCompleteTime == 0 or empty DecodeSubReq.ITL. But the test body runs a normal well-formed simulation and checks that completed parents have non-missing TTFT entries. The comment inside even states: "the fallback branch should never fire in a well-formed simulation."

The test is useful — it verifies R1 (no silent data loss) for the primary path — but it doesn't exercise the fallback code path at all. There are two options:

Option A (rename — minimal change): Rename to reflect what it actually tests:

// TestDisaggregation_TTFT_NoSilentDrops verifies R1: all completed PD parents
// have a TTFT entry after projection, regardless of which branch fired.
func TestDisaggregation_TTFT_NoSilentDrops(t *testing.T) {

Option B (inject fallback directly — stronger): Since the fallback cannot be triggered via a well-formed cluster simulation, test it by calling projectPDMetrics on a RawMetrics snapshot constructed with a parent whose TransferCompleteTime == 0:

func TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing(t *testing.T) {
    // Build a minimal merged metrics map with a prefill TTFT but no transfer data.
    parent := &ParentRequest{
        ID: "p1", PrefillSubReqID: "p1_prefill", DecodeSubReqID: "p1_decode",
        CompletionTime: 1000, ArrivalTime: 0,
        DecodeInstanceID: "inst-0",
        TransferStartTime: 0, TransferCompleteTime: 0, // missing transfer data
    }
    m := &sim.RawMetrics{...}
    m.RequestTTFTs["p1_prefill"] = 500.0
    // call projectPDMetrics with this parent...
    // assert RequestTTFTs["p1"] == 500.0 (fallback value)
}

Option A is the minimum fix to close the misleading name. Option B gives actual coverage of the branch. Fix this →


Standards Compliance Checklist (R1–R23)

Rule Status Notes
R1 (no silent data loss) Delete moved after read of prefillTTFT; fallback uses Warnf not silent drop
R2 (sort map keys before float accumulation) ⚠️ BC-3 test accumulates RequestTTFTs map without sorting. Functionally harmless (integer-valued floats, test-internal accumulation, not simulation output) but technically violates R2 letter
R4 (construction site audit) No new struct construction sites
R6 (no logrus.Fatal in sim/) Uses logrus.Warnf correctly
R7 (invariant tests alongside golden tests) Two invariant tests added
R14 (no multi-module methods) projectPDMetrics only touches metrics projection
Others Not applicable to this change

Behavioral Tests Assessment

What's covered:

  • BC-1 (exact formula) — tested with math.Abs tolerance
  • BC-2 (TTFT > prefill-only) — tested
  • BC-3 (TTFTSum consistency) — tested via map sum comparison
  • BC-4 (TTFT ≤ E2E) — tested, now fails hard on missing E2E (fixed from prior review)
  • BC-5 (non-PD clusters unaffected) — implicitly covered by existing TestDisaggregation_MetricProjection_NoOp

What's not covered:

  • else if hasPrefillTTFT fallback branch — not exercised (see issue above)
  • else branch (no prefill TTFT at all for a completed parent) — not exercised; this is likely unreachable in practice

Both uncovered branches are edge/error paths, not the happy path — the core fix is solid.


Separation of Concerns / Architecture

No concerns. The change is entirely within projectPDMetrics() — a post-simulation metrics projection step. It correctly reads from ParentRequest lifecycle fields (TransferStartTime, TransferCompleteTime, DecodeSubReq.ITL) which are the authoritative source for transfer and decode timing, rather than threading new parameters. Module boundary between the cluster coordinator and instance-level metrics is respected.

Performance

No concerns. projectPDMetrics() is called once at simulation teardown. The change adds two float64 additions and one int64 cast per parent request — negligible compared to event processing.

Documentation

The inline comment at cluster.go:1181–1187 is accurate and well-written. It correctly describes the llm-d pipeline semantics (prefill → KV transfer → decode → first user token), cites issue #930, and explains the R1 guard reasoning. PR behavioral contracts (BC-1 through BC-5) map cleanly to the implementation.


Summary: Fix is correct and complete. The second commit addressed all prior review feedback. The one remaining action item is fixing the fallback test name (Option A, one-line change) or adding actual fallback coverage (Option B).

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 2, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @namasl's task in 5m 28s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context and changed files
  • Read implementation changes (cluster.go)
  • Read test changes (disaggregation_test.go)
  • Read standards docs (rules.md, invariants.md, principles.md)
  • Deep correctness and invariant analysis
  • Aggregate findings and post final review

Verdict: Approve — two minor hygiene items

This is the third review pass. The fix is correct and the tests are now solid. Both substantive issues from round 2 have been resolved:

  • BC-3 TTFTSum consistency is tested
  • TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing now directly exercises the fallback branch via synthetic ClusterSimulator construction rather than a well-formed simulation

Two small hygiene items remain.


Correctness

Does this PR tackle issue #930 correctly? Yes.

The root cause was that RequestTTFTs[parentID] captured arrival → prefill_done only. In llm-d PD disaggregation, the first user-visible token is produced by the decode pod. The formula:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

is implemented exactly at cluster.go:1193–1199.

TTFTSum adjustment (cluster.go:1199): m.TTFTSum += int64(newTTFT - prefillTTFT). Both transferDuration and firstDecodeStep are float64-cast from int64 timestamps, so the difference is always an exact integer-valued float. int64(...) is lossless. The pattern mirrors cluster.go:1132 (warm-up TTFT adjustment). Correct. ✓

R1 compliance: prefillTTFT is read at line 1189 before the delete calls at lines 1190–1191. Prior code had the read inside the if completed gate after the deletes — a latent R1 violation, now fixed. ✓

R6 compliance: New warning path uses logrus.Warnf, not Fatalf. ✓

INV-5 (causality TTFT ≤ E2E): Structurally preserved — TTFT covers arrival → first_decode_token, E2E covers arrival → last_decode_token. ✓

INV-6 (determinism): The outer loop for _, parent := range c.parentRequests iterates an unsorted map, but the operations are either independent map mutations (unique keys per parent — no conflicts) or int64 accumulation (associative/commutative — no floating-point non-determinism). INV-6 is not violated. ✓

TTFT distribution output: Computed from sorted RequestTTFTs map values via mapValuessortedKeys at metrics.go:139–140. Not from TTFTSum. TTFTSum is an internal consistency check field only. ✓


Standards Compliance (R1–R23)

Rule Status Notes
R1 (no silent data loss) Read before delete; fallback warns not silently drops
R2 (sort map keys before float accumulation) ⚠️ BC-3 test accumulates RequestTTFTs unsorted (see Issue 1 below)
R4 (construction site audit) No new struct fields; ParentRequest struct uses existing fields
R6 (no logrus.Fatal in sim/) Uses logrus.Warnf correctly
R7 (invariant tests alongside golden tests) Three invariant tests added
R14 (no multi-module methods) projectPDMetrics is a metrics projection step only
R15 (stale PR references) #930 references are live issue citations, not stale todos
Others Not applicable to this change

Issues Found

Hygiene

1. BC-3 test accumulates RequestTTFTs without sorted keys [disaggregation_test.go:984]

var manualSum float64
for _, v := range m.RequestTTFTs {  // ← unsorted map iteration (R2)
    manualSum += v
}

Technically violates R2 (unsorted map iteration feeding a float sum). In practice, all values are integer-valued float64s (derived from int64 timestamps), so float addition order doesn't matter and the result is deterministic. The previous round 2 review flagged this as "⚠️ Functionally harmless." Still present. A one-line fix:

for _, k := range sortedKeys(m.RequestTTFTs) {
    manualSum += m.RequestTTFTs[k]
}

Not blocking, but worth fixing for pattern consistency since the project's standards take R2 seriously. Fix this →

2. Decode sub-request key assertion in fallback test is vacuously true [disaggregation_test.go:1086–1089]

if _, exists := m.RequestTTFTs[tc.parent.DecodeSubReqID]; exists {
    t.Errorf("INV-PD-6: decode sub-request key %s still present", tc.parent.DecodeSubReqID)
}

The fallback test pre-populates only m.RequestTTFTs[tc.parent.PrefillSubReqID]; the decode key is never set. So this assertion is always trivially true (the key never existed). The assertion for the prefill key deletion is the meaningful one. Not a bug, just a weak assertion. Fix this →


Behavioral Tests Coverage

Contract Test Status
BC-1 (exact formula) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-2 (TTFT > prefill-only) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-3 (TTFTSum consistency) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-4 (TTFT ≤ E2E, hard failure on missing E2E) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-5 (non-PD clusters unaffected) TestDisaggregation_MetricProjection_NoOp (pre-existing)
R1 (no silent TTFT drops) TestDisaggregation_TTFT_NoSilentDrops
Fallback (TransferCompleteTime=0) TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback (empty ITL) TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback (nil DecodeSubReq) TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
INV-PD-6 (sub-req keys deleted in fallback) TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing ✅ (prefill deletion verified; decode deletion trivially true — see Issue 2)

Separation of Concerns / Architecture

No concerns. Change is entirely within projectPDMetrics() — a post-simulation metrics projection step. Reads from ParentRequest lifecycle timestamps (the authoritative source for transfer/decode timing) without threading new parameters through the execution path. Module boundary respected. ✓

Performance

No concerns. projectPDMetrics() is called once at teardown; two float64 additions and one int64 cast per parent — negligible. ✓

Documentation

Inline comment at cluster.go:1181–1187 accurately describes the llm-d pipeline (prefill → KV transfer → decode → first user token) and cites issue #930. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. ✓


Summary: Fix is correct and complete. Both issues from round 2 are resolved. The two remaining items (R2 in BC-3 test, vacuous decode-key assertion in fallback test) are hygiene-level and non-blocking.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 2, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @namasl's task in 10s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context (round 4 review)
  • Read implementation changes (cluster.go, pd_events.go, pd_metrics.go)
  • Read test changes (disaggregation_test.go)
  • Read standards docs
  • Run specialized review agents (parallel)
  • Aggregate findings and post final review

Verdict: Approve — one documentation fix needed before merge

This is the fourth review pass. The fix is correct and the two round 3 hygiene items are addressed. One documentation accuracy issue discovered in a file outside the diff that this PR leaves stale.


Correctness

Does this PR tackle issue #930 correctly? Yes.

The formula TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0] is implemented exactly at cluster.go:1193–1199. All three components are sequential in the llm-d PD pipeline and additive.

R1: prefillTTFT is read at line 1189 before the delete calls at 1190–1191. Read-before-delete ordering is now explicit and documented with an R1 comment. ✓

R6: Both warning paths use logrus.Warnf, not Fatalf. ✓

int64 truncation: int64(newTTFT - prefillTTFT) = transferDuration + firstDecodeStep. Both are float64(int64) casts — the sum of two integer-valued float64s is exact. int64(...) is lossless. ✓

INV-5 (causality): newTTFT ≤ E2E holds structurally — TTFT covers arrival → first_decode_token, E2E covers arrival → last_decode_token, and the prefill sub-request uses the parent's ArrivalTime so the origin is identical. ✓

INV-6 (determinism): The outer loop iterates c.parentRequests (a map), but TTFTSum += delta is integer addition — commutative and associative. Final TTFTSum is deterministic regardless of iteration order. ✓

BC-3 (TTFTSum consistency): aggregateMetrics() sums all instance-level TTFTSum values, which include each prefill sub-request's FirstTokenTime as the base. m.TTFTSum += int64(newTTFT - prefillTTFT) adds the exact delta (transfer + decode). No double-counting. In the fallback path (else if hasPrefillTTFT), the TTFT value is unchanged so no adjustment is needed — TTFTSum already contains the prefill value. ✓

INV-PD-4 (transfer start ≤ complete): Verified in pd_events.go:291–294 — a defensive clamp-to-zero exists in the trace path. The guard parent.TransferCompleteTime > 0 in cluster.go:1193 is sufficient in practice since TransferCompleteTime is set from e.time (always positive) in pd_events.go:185, and TransferStartTime is always set before it at pd_events.go:106. See note below about guard consistency.


Issues Found

Important (fix before merge)

Stale comment in pd_metrics.go:113–116 — this file is not in the diff, but the PR changes its documented semantics.

// pd_metrics.go:113–116 (CURRENT — now incorrect)
// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the prefill sub-request TTFT has been rekeyed
// from PrefillSubReqID to the parent ID. The value is unchanged: first_token_time -
// arrival_time, recorded when ProgressIndex == inputLen (prefill completion).

"The value is unchanged" is now wrong. After this PR, the projected value is prefillTTFT + transferDuration + firstDecodeStep, not the raw prefill TTFT. The phrase "prefill completion" is also misleading — it implies the value only covers the prefill phase, which is no longer true. Any future reader of pd_metrics.go relying on this comment will have a wrong mental model.

Suggested replacement:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the user-visible TTFT is keyed by the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).
// Missing key returns 0.0; exclude 0.0 values (BC-11).

Fix this →

Hygiene

Guard inconsistency between cluster.go and pd_metrics.go [cluster.go:1193]

cluster.go guards with parent.TransferCompleteTime > 0. pd_metrics.go:124 uses the stronger guard p.TransferStartTime > 0 && p.TransferCompleteTime >= p.TransferStartTime. The difference is only material if TransferCompleteTime < TransferStartTime (an INV-PD-4 violation), which pd_events.go:292–294 logs but does NOT correct on the parent struct. If INV-PD-4 is violated, cluster.go would compute a negative transferDuration, potentially producing a TTFT less than prefillTTFT, violating BC-2. Aligning to the stronger guard (matching pd_metrics.go) costs one comparison and eliminates this class of inconsistency:

// cluster.go:1193 — current
if hasPrefillTTFT && parent.TransferCompleteTime > 0 && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

// Suggested — aligned with pd_metrics.go:124
if hasPrefillTTFT && parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

Fix this →


Standards Compliance (R1–R23)

Rule Status Notes
R1 (no silent data loss) Read before delete, fallback warns not silently drops
R2 (sort map keys before float accumulation) TTFTSum delta addition is commutative — no sort needed; BC-3 test uses sortedKeys
R4 (construction site audit) No new struct fields added
R6 (no logrus.Fatal in sim/) Both new paths use logrus.Warnf
R7 (invariant tests alongside golden tests) Three new invariant tests
R14 (no multi-module methods) Change is self-contained in projectPDMetrics
R15 (stale PR references) ⚠️ pd_metrics.go:116 comment is stale after this PR's changes (see issue above)

Behavioral Test Coverage

Contract Test Status
BC-1 (exact formula) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-2 (TTFT > prefill-only) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-3 (TTFTSum consistency) TestDisaggregation_TTFT_IncludesTransferAndDecode via sortedKeys
BC-4 (TTFT ≤ E2E, hard fail on missing E2E) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-5 (non-PD clusters unaffected) Pre-existing TestDisaggregation_MetricProjection_NoOp
R1 (no silent drops) TestDisaggregation_TTFT_NoSilentDrops
Fallback: TransferCompleteTime=0 TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: empty ITL TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: nil DecodeSubReq TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
INV-PD-6 (sub-req keys deleted) TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing

Tests are behavioral (observable post-conditions via public APIs), meet the refactor survival criterion, and cover all branches including the defensive fallback. The fallback test correctly exercises the unreachable-from-well-formed-sim branch via synthetic struct injection. No critical gaps.


Separation of Concerns / Architecture

No concerns. The change is confined to projectPDMetrics(), a post-simulation metrics projection step. It reads from ParentRequest lifecycle timestamps (the authoritative source) rather than threading new parameters. Module boundaries respected. ✓

Performance

No concerns. projectPDMetrics() is called once at teardown. Two float64 additions and one int64 cast per parent — negligible. ✓

Documentation (PR-level)

Inline comment at cluster.go:1181–1187 is accurate and well-written: describes the llm-d pipeline semantics, cites issue #930, and explains the R1 guard. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. The one exception is pd_metrics.go:113–116 which is now stale (see important issue above).


Summary: Fix is correct and test coverage is comprehensive. The only action item before merge is updating the stale comment in pd_metrics.go:113–116. The guard inconsistency is a hygiene note.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 2, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @namasl's task in 8m 21s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context (round 5 — addressing round 4 advisory items)
  • Read implementation (cluster.go)
  • Read test changes (disaggregation_test.go)
  • Read standards docs (rules.md)
  • Run specialized review agents (code, tests, errors)
  • Aggregate findings and post final review

Verdict: Approve — two advisory items

This is round 5. All prior round feedback (R2 sortedKeys, BC-2 non-triviality guards, decode-key pre-population in fallback test, BC-4 hard failure on missing E2E) has been addressed correctly. The implementation is correct and the test suite is solid. Two advisory items remain, neither blocking.


Correctness

Does this PR tackle issue #930 correctly? Yes. The formula prefillTTFT + transferDuration + firstDecodeStep is computed exactly at cluster.go:1193–1199. The TTFTSum adjustment in the fallback path (else if hasPrefillTTFT) is correctly zero — the prefill TTFT was already included in TTFTSum by instance-level aggregation; the map rekey preserves the sum without additional delta.

R1/R2/R6/R7/R14: All pass. See prior rounds; no regressions introduced.


Advisory Items

1. Guard inconsistency: cluster.go:1193 uses a weaker predicate than the established pattern in pd_metrics.go:124

cluster.go:1193:

if hasPrefillTTFT && parent.TransferCompleteTime > 0 && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

pd_metrics.go:124 (established pattern):

if p.TransferStartTime > 0 && p.TransferCompleteTime >= p.TransferStartTime

The difference: the cluster.go guard does not check TransferStartTime > 0 or TransferCompleteTime >= TransferStartTime. If INV-PD-4 were ever violated (TransferCompleteTime recorded without a matching KVTransferStartedEvent, or a priority inversion bug leaving TransferStartTime=0), transferDuration could be inflated or negative, silently corrupting TTFTSum at line 1199 (int64(newTTFT - prefillTTFT) would be a large negative or inflated number with no log). The existing event-ordering guarantee (INV-3 + causal scheduling) makes this unreachable today, but the divergence from the established guard pattern leaves a latent hazard.

Suggested fix: Align with pd_metrics.go:124:

if hasPrefillTTFT && parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

Fix this →

2. Fallback test has no case for completed && !hasPrefillTTFT (Branch C, cluster.go:1204)

The else branch at line 1204 fires when a completed parent has no prefill TTFT entry at all. All three sub-cases in TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing pre-populate m.RequestTTFTs[tc.parent.PrefillSubReqID], so they all land in the else if hasPrefillTTFT branch. Branch C is not exercised by any test. TestDisaggregation_TTFT_NoSilentDrops would catch a regression in an integration run, but Branch C is never deliberately triggered.

Suggested addition — a fourth sub-case to the table-driven fallback test:

{
    name: "no prefill TTFT key",
    parent: &ParentRequest{
        ID: "p4", PrefillSubReqID: "p4_prefill", DecodeSubReqID: "p4_decode",
        OriginalRequest: origReq,
        ArrivalTime: 0, CompletionTime: 5000, DecodeInstanceID: "inst-0",
        TransferStartTime: 100, TransferCompleteTime: 200,
        DecodeSubReq: &sim.Request{ITL: []int64{100}},
    },
    // intentionally do NOT pre-populate m.RequestTTFTs["p4_prefill"]
},

The assertion: m.RequestTTFTs["p4"] must be absent (correct omission, not silent zero insert), and sub-request keys must also be absent (INV-PD-6). Fix this →


Standards Compliance (R1–R23)

Rule Status Notes
R1 (no silent data loss) Read before delete; all branches warn or produce valid output
R2 (sort map keys before float accumulation) BC-3 test uses sortedKeys(m.RequestTTFTs)
R6 (no logrus.Fatal in sim/) All new log calls use Warnf
R7 (invariant tests alongside golden tests) Three new invariant tests
R14 (no multi-module methods) projectPDMetrics single-concern
Others No new config, interfaces, or YAML

Behavioral Tests Coverage

Contract Test Status
BC-1 (exact formula) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-2 (TTFT > prefill-only, with non-triviality guards) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-3 (TTFTSum consistency, sortedKeys) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-4 (TTFT ≤ E2E, hard failure on missing E2E) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-5 (non-PD unaffected) TestDisaggregation_MetricProjection_NoOp (pre-existing)
R1 (no silent drops) TestDisaggregation_TTFT_NoSilentDrops
Fallback: TransferCompleteTime=0 TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: empty ITL TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: nil DecodeSubReq TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: no prefill TTFT key (Branch C) ⚠️ uncovered (advisory)

Performance / Architecture / Documentation

No concerns. projectPDMetrics() is a teardown-only projection step. Module boundaries respected. Inline comment at cluster.go:1181–1187 is accurate. PR behavioral contracts map cleanly to the implementation.


Summary: Fix is correct and complete. All prior round feedback has been addressed. The two remaining advisory items (guard consistency and Branch C coverage) are non-blocking hygiene. The PR is ready to merge.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 3, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @namasl's task in 2m 39s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context (round 6)
  • Read implementation (cluster.go, pd_metrics.go)
  • Read test changes (disaggregation_test.go)
  • Analyze correctness, standards compliance, and test coverage
  • Post final review

Verdict: Approve — one stale comment from round 4 still needs fixing

Round 5's two advisory items (guard inconsistency, Branch C coverage) have been addressed correctly. The only remaining item is a stale comment in pd_metrics.go:113–116 that was flagged as "Important (fix before merge)" in round 4 but was not resolved. The file has no diff in this PR.


What was fixed since round 5

Advisory 1 (guard inconsistency) — FIXED ✅ [cluster.go:1193]

The guard now matches the stronger predicate established in pd_metrics.go:124:

if hasPrefillTTFT && parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime && parent.DecodeSubReq != nil && len(parent.DecodeSubReq.ITL) > 0 {

If TransferCompleteTime < TransferStartTime (an INV-PD-4 violation), transferDuration cannot go negative. The alignment eliminates guard drift between the two projection functions.

Advisory 2 (Branch C coverage) — FIXED ✅ [disaggregation_test.go, 4th sub-case]

The skipPrefillTTFT: true case correctly exercises Branch C:

  • m.RequestTTFTs["p4_prefill"] is not set → hasPrefillTTFT = false
  • m.RequestTTFTs["p4_decode"] = 999.0 is pre-populated (non-trivial INV-PD-6 assertion)
  • Transfer data and ITL are valid, so the case would reach the happy path guard if hasPrefillTTFT were true — the only reason it reaches Branch C is the missing prefill key
  • Asserts: no parent TTFT entry created, both sub-request keys deleted

Outstanding Issue

Important (fix before merge)

Stale comment in pd_metrics.go:113–116 — first flagged in round 4, not addressed

pd_metrics.go has no diff in this PR. The comment still reads:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the prefill sub-request TTFT has been rekeyed
// from PrefillSubReqID to the parent ID. The value is unchanged: first_token_time -
// arrival_time, recorded when ProgressIndex == inputLen (prefill completion).
// Missing key returns 0.0 in Go maps; exclude 0.0 values (BC-11).

"The value is unchanged" and "prefill completion" are both incorrect after this PR. The projected value now includes KV transfer duration and the first decode step. Any future reader of pd_metrics.go will have a wrong mental model about what aggregated.RequestTTFTs[p.ID] contains at this call site.

Suggested replacement:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the user-visible TTFT is stored under the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).
// Missing key returns 0.0 in Go maps; exclude 0.0 values (BC-11).

Fix this →


Correctness Summary

The fix is correct and unchanged from prior rounds:

  • Formula prefillTTFT + transferDuration + firstDecodeStep is exact at cluster.go:1193–1199
  • TTFTSum adjustment (int64(newTTFT - prefillTTFT)) is lossless — both addends are integer-valued float64
  • R1 (read before delete), R6 (Warnf not Fatalf), INV-5 (causality), INV-6 (determinism): all pass

Test Coverage Summary

Contract Test Status
BC-1 (exact formula) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-2 (TTFT > prefill-only, non-triviality guards) same
BC-3 (TTFTSum consistency, sortedKeys) same
BC-4 (TTFT ≤ E2E, hard fail on missing E2E) same
BC-5 (non-PD clusters unaffected) pre-existing TestDisaggregation_MetricProjection_NoOp
R1 (no silent drops) TestDisaggregation_TTFT_NoSilentDrops
Fallback: TransferCompleteTime=0 TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: empty ITL same
Fallback: nil DecodeSubReq same
Branch C: no prefill TTFT key same (4th sub-case)
INV-PD-6 (sub-req keys deleted) same

All branches are exercised. The fallback test correctly pre-populates the decode sub-request key so the INV-PD-6 deletion assertions are non-trivial for all cases.


Summary: Fix is correct, tests are comprehensive, and all round 5 advisory items have been addressed. The only remaining action item is updating the stale comment in pd_metrics.go:113–116, which was marked "Important (fix before merge)" in round 4 and has not been addressed.
| Branch: fix/pd-ttft-user-visible

namasl added a commit to namasl/inference-sim that referenced this pull request Apr 3, 2026
After PR inference-sim#931 the projected TTFT value includes KV transfer duration and
first decode step, not just the rekeyed prefill TTFT. Update the comment
at the call site to reflect the corrected semantics (issue inference-sim#930).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 3, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @namasl's task in 3m 16s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context (round 7)
  • Read changed files (cluster.go, pd_metrics.go, disaggregation_test.go)
  • Verify all prior round advisory items resolved
  • Final standards compliance check
  • Aggregate findings and post final review

Verdict: Approve — ready to merge

This is round 7. All prior round feedback has been addressed. The round 6 "Important (fix before merge)" item — the stale comment in pd_metrics.go:113–116 — is now fixed. No new issues found. The PR is clean.


What changed since round 6

Stale comment in pd_metrics.go:113–116 — FIXED ✅

Old (incorrect after this PR):

// After projectPDMetrics(), the prefill sub-request TTFT has been rekeyed
// from PrefillSubReqID to the parent ID. The value is unchanged: first_token_time -
// arrival_time, recorded when ProgressIndex == inputLen (prefill completion).

New (correct):

// After projectPDMetrics(), the user-visible TTFT is stored under the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).

This was the final blocker identified in round 4 and re-flagged in round 6. It's resolved.


Correctness (unchanged from prior rounds, confirmed still holds)

Does this PR tackle issue #930 correctly? Yes.

Formula at cluster.go:1193–1199:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

All three components are sequential and additive in the llm-d PD pipeline.

TTFTSum adjustment (cluster.go:1199): m.TTFTSum += int64(newTTFT - prefillTTFT). Both transferDuration and firstDecodeStep are float64-cast from int64 timestamps — the sum is always an exact integer-valued float64 and int64(...) is lossless. For the fallback path, no adjustment is made (TTFTSum already contains the prefill value from instance-level aggregation). ✓

R1 (read before delete): prefillTTFT read at line 1189, deletes at 1190–1191. ✓

R6 (no logrus.Fatal in sim/): Both warn paths use Warnf. ✓

INV-5 (causality TTFT ≤ E2E): Structurally preserved — TTFT covers arrival → first_decode_token, E2E covers arrival → last_decode_token. ✓

Guard consistency (round 5 advisory #1): cluster.go:1193 now uses the stronger parent.TransferStartTime > 0 && parent.TransferCompleteTime >= parent.TransferStartTime, matching the established pattern in pd_metrics.go:124. If INV-PD-4 is ever violated, transferDuration cannot go negative or be inflated. ✓


Standards Compliance (R1–R23)

Rule Status Notes
R1 (no silent data loss) Read before delete; all branches warn or produce valid output
R2 (sort map keys before float accumulation) BC-3 test uses sortedKeys(m.RequestTTFTs)
R6 (no logrus.Fatal in sim/) Both new log calls use Warnf
R7 (invariant tests alongside golden tests) Three new invariant tests
R14 (no multi-module methods) projectPDMetrics single-concern
R15 (stale references) pd_metrics.go comment updated
Others No new config, interfaces, or YAML

Behavioral Test Coverage

Contract Test Status
BC-1 (exact formula) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-2 (TTFT > prefill-only, non-triviality guards) same
BC-3 (TTFTSum consistency, sortedKeys) same
BC-4 (TTFT ≤ E2E, hard fail on missing E2E) same
BC-5 (non-PD clusters unaffected) TestDisaggregation_MetricProjection_NoOp (pre-existing)
R1 (no silent drops) TestDisaggregation_TTFT_NoSilentDrops
Fallback: TransferCompleteTime=0 TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: empty ITL same
Fallback: nil DecodeSubReq same
Branch C: no prefill TTFT key same (4th sub-case, skipPrefillTTFT: true)
INV-PD-6 (sub-req keys deleted) same — decode key pre-populated with 999.0; deletion is non-trivial

All branches are exercised. Tests are behavioral, use exported APIs (cs.ParentRequests()), and satisfy the refactor survival criterion.


Separation of Concerns / Architecture

No concerns. The change is entirely within projectPDMetrics() — a post-simulation metrics projection step, called once at teardown. It reads from ParentRequest lifecycle timestamps (the authoritative source) rather than threading new parameters through the execution path. Module boundary between the cluster coordinator and instance-level metrics is respected. ✓

Performance

No concerns. projectPDMetrics() is called once at simulation teardown. The change adds two float64 additions and one int64 cast per parent request — negligible. ✓

Documentation

cluster.go:1181–1187 accurately describes the llm-d pipeline semantics. pd_metrics.go:113–116 now correctly describes the post-projection value. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. ✓


Summary: The fix is correct, complete, and well-tested. All seven rounds of review feedback have been addressed. No remaining action items.
| Branch: fix/pd-ttft-user-visible

namasl and others added 7 commits April 3, 2026 14:35
…erence-sim#930)

In PD disaggregated mode, user-visible TTFT was understated because it
only captured prefill sub-request completion time. In llm-d, the first
token reaches the user from the decode pod — after prefill, KV transfer,
and one decode step. Update projectPDMetrics() to compute user-visible
TTFT as prefillTTFT + transferDuration + firstDecodeStep.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add BC-3 test: TTFTSum consistency with sum(RequestTTFTs) after projection
- Add fallback path test: completed parents always have TTFT entries (R1)
- Use cs.ParentRequests() instead of cs.parentRequests (exported API)
- Assert E2E presence unconditionally in BC-4 causality check

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The fallback test was renamed/split: TestDisaggregation_TTFT_NoSilentDrops
verifies R1 (no dropped entries) via a normal simulation, while
TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing now directly
exercises the defensive fallback in projectPDMetrics by constructing
synthetic ParentRequests with TransferCompleteTime=0, empty ITL, and nil
DecodeSubReq. Table-driven with three cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- BC-3 test: iterate RequestTTFTs with sortedKeys for R2 compliance
- Fallback test: pre-populate decode sub-request key so INV-PD-6
  deletion assertion is non-trivial

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- BC-2: add explicit non-triviality guards for transferDuration and
  firstDecodeStep so a config change zeroing either addend causes a
  clear test failure rather than a vacuously-passing assertion
- BC-3: add comment explaining why tolerance is 1.0 µs (TTFTSum is
  int64-truncated) rather than 1e-9 used elsewhere in the file

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Align happy-path guard with established pd_metrics.go pattern
(TransferStartTime > 0 && TransferCompleteTime >= TransferStartTime)
to prevent silent TTFTSum corruption on invariant violation.

Add Branch C coverage to FallbackWhenDecodeDataMissing: fourth
sub-case exercises the else path where a completed parent has no
prefill TTFT key, asserting no entry is written and INV-PD-6 holds.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
After PR inference-sim#931 the projected TTFT value includes KV transfer duration and
first decode step, not just the rekeyed prefill TTFT. Update the comment
at the call site to reflect the corrected semantics (issue inference-sim#930).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@namasl namasl force-pushed the fix/pd-ttft-user-visible branch from c47eb95 to 0601db8 Compare April 3, 2026 14:36
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Apr 3, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @namasl's task in 3m 43s —— View job


PR Review: fix(cluster): PD TTFT includes KV transfer and first decode step

Tasks

  • Gather PR context (round 8)
  • Read changed files (cluster.go, pd_metrics.go, disaggregation_test.go)
  • Standards compliance and correctness check
  • Aggregate findings and post final review

Verdict: Approve — ready to merge

This is round 8. The most recent commit (0601db8) addressed the final outstanding item from round 4/6 — the stale comment in pd_metrics.go:113–116. No new issues found. The PR is clean.


What changed since round 7

Stale pd_metrics.go comment — FIXED ✅ (0601db8)

pd_metrics.go:113–116 now correctly reads:

// BC-1: parent TTFT from the aggregated TTFT map, keyed by parent ID.
// After projectPDMetrics(), the user-visible TTFT is stored under the parent ID.
// Value = prefillTTFT + KV transfer duration + first decode step — the full
// time from arrival to the first token reaching the user (issue #930).
// Missing key returns 0.0 in Go maps; exclude 0.0 values (BC-11).

The previously incorrect "The value is unchanged: first_token_time - arrival_time, recorded when ProgressIndex == inputLen (prefill completion)" is gone.


Correctness (confirmed against current code)

Does this PR tackle issue #930 correctly? Yes.

Formula at cluster.go:1209–1215:

TTFT = prefillTTFT + (TransferCompleteTime - TransferStartTime) + DecodeSubReq.ITL[0]

All three components are sequential and additive in the llm-d PD pipeline. This is the first user-visible token path: prefill pod completes → KV transfer → decode pod emits first output token.

Check Line Status
R1: read prefillTTFT before delete calls 1205 (before 1206–1207)
R6: no logrus.Fatal in sim/ 1219, 1221
Guard aligned with pd_metrics.go:124 1209 (TransferStartTime > 0 && TransferCompleteTime >= TransferStartTime)
TTFTSum delta is lossless int64 1215 (int64(newTTFT - prefillTTFT))
INV-5 causality: TTFT ≤ E2E structural
INV-6 determinism TTFTSum delta is integer addition (commutative)
pd_metrics.go comment accurate 113–117

Standards Compliance (R1–R23)

Rule Status Notes
R1 (no silent data loss) Read before delete; all branches warn or produce valid output
R2 (sort map keys before float accumulation) BC-3 test uses sortedKeys(m.RequestTTFTs) (line 1001)
R6 (no logrus.Fatal in sim/) Both log calls use Warnf
R7 (invariant tests alongside golden tests) Three new invariant tests
R14 (no multi-module methods) Change confined to projectPDMetrics()
R15 (no stale references) pd_metrics.go comment updated

Behavioral Test Coverage

Contract Test Status
BC-1 (exact formula) TestDisaggregation_TTFT_IncludesTransferAndDecode
BC-2 (TTFT > prefill-only, non-triviality guards) same
BC-3 (TTFTSum consistency, sortedKeys) same
BC-4 (TTFT ≤ E2E, hard failure on missing E2E) same
BC-5 (non-PD clusters unaffected) TestDisaggregation_MetricProjection_NoOp (pre-existing)
R1 (no silent drops) TestDisaggregation_TTFT_NoSilentDrops
Fallback: TransferCompleteTime=0 TestDisaggregation_TTFT_FallbackWhenDecodeDataMissing
Fallback: empty ITL same
Fallback: nil DecodeSubReq same
Branch C: no prefill TTFT key same (skipPrefillTTFT: true, 4th sub-case)
INV-PD-6 (sub-req keys deleted) same — decode key pre-populated with 999.0 (non-trivial)

Tests use cs.ParentRequests() (exported API, lines 936 and 1019). The fallback test directly injects synthetic state via the unexported struct fields — unavoidable for exercising an unreachable-from-normal-simulation branch, and acceptable in package cluster same-package tests. Refactor survival criterion holds for the integration tests; the synthetic-injection fallback test is structurally coupled by design.


Separation of Concerns / Architecture

No concerns. The change is entirely within projectPDMetrics() — a post-simulation metrics projection step called once at teardown. Transfer timestamps are read from ParentRequest lifecycle fields (the authoritative source), not threaded as new parameters. Module boundary between the cluster coordinator and instance-level metrics is respected. ✓

Performance

No concerns. Two float64 additions and one int64 cast per parent request at teardown. ✓

Documentation

cluster.go:1197–1203 accurately describes the llm-d pipeline semantics and cites issue #930. pd_metrics.go:113–117 now correctly describes the post-projection value. PR behavioral contracts BC-1 through BC-5 map directly to the implementation. ✓


Summary: Fix is correct, complete, and well-tested. All eight rounds of review feedback have been addressed across seven commits. No remaining action items. The PR is ready to merge.
| Branch: fix/pd-ttft-user-visible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PD disaggregated TTFT understates user-visible latency (missing KV transfer + first decode step)

1 participant