Skip to content

feat(replay): closed-loop session sequencing for trace replay#896

Open
idan-friedman wants to merge 5 commits intoinference-sim:mainfrom
idan-friedman:pr880-replay-session-mode
Open

feat(replay): closed-loop session sequencing for trace replay#896
idan-friedman wants to merge 5 commits intoinference-sim:mainfrom
idan-friedman:pr880-replay-session-mode

Conversation

@idan-friedman
Copy link
Copy Markdown

Summary

  • Adds --session-mode closed-loop to blis replay: session follow-up rounds arrive at completion_tick + think_time instead of pre-baked trace timestamps, enabling load-adaptive multi-turn simulation
  • Adds --think-time-ms to override inter-round think time (0 = derive from trace inter-round arrival gaps)
  • Implements SequenceSampler — replays trace token counts per-round in order, wraps on exhaustion, safe for empty sequences
  • Adds ThinkTimeSampler LengthSampler field to SessionBlueprint for per-round think time variation; nil falls back to constant ThinkTimeUs
  • Implements LoadTraceV2SessionBlueprints which groups trace records by SessionID, builds deterministic blueprints with SequenceSampler for input/output token counts and think times, and injects only round-0 requests for initial DES injection
  • Wires SessionManager.OnComplete as the onRequestDone callback so follow-up rounds are generated load-adaptively

Test plan

  • TestSequenceSampler_ReplayInOrder — values returned in order
  • TestSequenceSampler_WrapsOnExhaustion — wraps to beginning after exhaustion
  • TestSequenceSampler_SingleValue — single-element sampler always returns same value
  • TestSequenceSampler_EmptyValues — returns 1 (minimum) without panic
  • TestLoadTraceV2SessionBlueprints_GroupsBySession — 2 sessions → 2 blueprints, 2 round-0 requests
  • TestLoadTraceV2SessionBlueprints_NonSessionPassThrough — non-session records included as-is
  • TestLoadTraceV2SessionBlueprints_ThinkTimeFromTrace — inter-round gaps derived correctly
  • TestLoadTraceV2SessionBlueprints_SingleRoundSession — single-round session has nil ThinkTimeSampler
  • TestLoadTraceV2SessionBlueprints_OverrideThinkTime — override suppresses sampler
  • TestLoadTraceV2SessionBlueprints_NonConsecutiveRoundIndex_Error — error on gap in round indices
  • TestSession_ThinkTimeSampler_UsedWhenPresent — sampler overrides constant think time
  • TestSession_ThinkTimeSampler_NilFallsBack — nil sampler falls back to ThinkTimeUs
  • go test ./... -count=1 — all packages pass
  • go build ./... — clean build

Closes #880

🤖 Generated with Claude Code

idan-friedman and others added 5 commits March 31, 2026 21:54
- SequenceSampler replays a pre-recorded sequence of values in order,
  wrapping to the beginning on exhaustion
- Safe for empty sequences (returns minimum token count 1)
- Implements LengthSampler interface for use in SessionBlueprint

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add optional ThinkTimeSampler field (LengthSampler) to SessionBlueprint
- OnComplete samples from it when set; falls back to constant ThinkTimeUs
- Zero-value nil is safe: existing code paths unchanged

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ay (BC-5, BC-6, BC-7)

- Groups trace records by SessionID, builds SessionBlueprint per session
- Round-0 requests injected upfront; SessionManager drives follow-up rounds
- SequenceSamplers replay exact trace token counts per round
- Think time derived from inter-round arrival gaps (or overridden)
- Validates consecutive RoundIndex; errors on gaps/duplicates
- Non-session records returned as regular requests (BC-7)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…losed-loop session replay

Add --session-mode (fixed|closed-loop) and --think-time-ms flags to blis replay.
Closed-loop mode calls LoadTraceV2SessionBlueprints to build per-session blueprints
from the trace, injects only round-0 requests, and wires SessionManager.OnComplete
as the onRequestDone callback so follow-up rounds arrive at completion + think time
rather than from pre-baked timestamps.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
… review

- Add explanatory comment for negative-gap clamp in LoadTraceV2SessionBlueprints
  (non-monotone timestamps clamped to 0 to prevent INV-3 violation in OnComplete)
- Update CLAUDE.md with closed-loop replay examples (--session-mode, --think-time-ms)
- Add implementation plan doc for PR 880

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@idan-friedman idan-friedman force-pushed the pr880-replay-session-mode branch from eb5ee7e to e7fc57e Compare March 31, 2026 18:57
@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 31, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness, including user and developer facing docs.

@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude finished @sriumcp's task in 7m 41s —— View job


PR Review: feat(replay): closed-loop session sequencing for trace replay

Tasks

  • Initialize review
  • Read changed files and gather context
  • Review: correctness vs parent issue
  • Review: separation of concerns & modularity
  • Review: standards adherence (R1-R23, INV-1 through INV-11)
  • Review: behavioral tests and non-structural tests
  • Review: API boundaries and contracts
  • Review: performance implications
  • Review: documentation correctness and completeness
  • Aggregate findings and post summary

Overall Assessment

The PR correctly addresses the parent issue (#880). All four stated deliverables are implemented: SequenceSampler, ThinkTimeSampler on SessionBlueprint, LoadTraceV2SessionBlueprints, and the two CLI flags wired through SessionManager.OnComplete. The design respects separation of concerns, invariants INV-6/INV-10/INV-11 are preserved, and all behavioral contracts are tested. No critical correctness bugs were found. One semantic ambiguity in the think-time derivation requires documentation, and two structural concerns should be addressed before merge.


Important Issues

1. Think-time derived from arrival gaps encodes service time — undocumented semantic [Important]

File: sim/workload/replay.go:164–174

gap := rounds[i].ArrivalTimeUs - rounds[i-1].ArrivalTimeUs

When a trace is produced by blis observe --concurrency N --think-time-ms X, the observe orchestrator sets each follow-up's ArrivalTime to wallClock_at_completion + think_time. Therefore:

gap = ArrivalTimeUs[i] - ArrivalTimeUs[i-1]
    = (completion_time[i-1] + think_time) - send_time[i-1]
    = service_time[i-1] + think_time

The gap is not pure think time — it includes the previous round's service time. Replaying without --think-time-ms override will produce sessions spaced service_time + think_time apart instead of think_time apart. Users running the observe → replay pipeline without the override flag will get longer-than-expected inter-round spacing.

The docstring for LoadTraceV2SessionBlueprints (line 82–84) and the --think-time-ms flag help text (line 341) should explicitly state: "trace-derived think time = inter-arrival gap (includes prior-round service time); use --think-time-ms to supply the actual client-side think time." This is especially important since LastChunkTimeUs in the TraceRecord uses Unix timestamps while ArrivalTimeUs uses elapsed-since-start, making direct subtraction unsafe — the current approach is the only viable one with the current schema, but users need to know what they're getting.

Fix this →


2. Double invocation of LoadTraceV2SessionBlueprints in cmd/replay.go [Important]

File: cmd/replay.go:98–119

// First call — only used to check len(blueprints)==0 and compute horizon
r0Requests, blueprints, bErr := workload.LoadTraceV2SessionBlueprints(traceData, seed, 0, 0)
...
// Second call — actual work
r0Requests2, blueprints2, bErr2 := workload.LoadTraceV2SessionBlueprints(traceData, seed, thinkTimeUs, replayHorizonPrelim)

The function iterates all records, builds a prefix-token map, groups/sorts/validates per-session, and generates RNG-derived token IDs — twice. This O(n log n) work is unnecessary. The horizon chicken-and-egg problem can be resolved without a full second call:

Option A: Compute the max arrival time from trace records directly in O(n) before calling the function once:

maxR0Arrival := maxRound0Arrival(traceData)  // one-pass scan
horizon := computeHorizon(maxR0Arrival)
blueprints := LoadTraceV2SessionBlueprints(traceData, seed, thinkTimeUs, horizon)

Option B: Split into HasSessions(trace) bool (fast O(n) check) and the full load.

For large traces (tens of thousands of sessions), the current approach processes everything twice.

Fix this →


Minor Issues

3. Missing test: negative inter-round gap clamping [Minor]

File: sim/workload/replay.go:165–170

The code correctly clamps negative gaps to 0 with an INV-3 rationale comment, but there's no test exercising this path. A test with rounds[0].ArrivalTimeUs = 5000, rounds[1].ArrivalTimeUs = 3000 should verify the clamped value is 0 (not negative), preventing regression.

Fix this →

4. Missing test: BC-12 (closed-loop with no session records) [Minor]

File: cmd/replay.go:103–104

The warning path len(blueprints) == 0 is documented in the plan as BC-12 but has no unit test. Since this lives in CLI code the test would be integration-level; at minimum a note in the plan doc that this contract has no automated coverage would be honest.

5. Struct field alignment inconsistency in SessionBlueprint [Cosmetic]

File: sim/workload/session.go:35–46

The newly added ThinkTimeSampler field aligns differently from surrounding fields (ThinkTimeUs, Timeout, Horizon vs RNG, ThinkTimeSampler, Prefix). Go gofmt should normalize this; run gofmt -w sim/workload/session.go before merge.

6. ScheduledStepIdx/FinishedStepIdx omitted in new round-0 construction site [Minor]

File: sim/workload/replay.go:195–217

LoadTraceV2Requests at line 53–54 explicitly sets ScheduledStepIdx: 0 and FinishedStepIdx: 0. The new construction site in LoadTraceV2SessionBlueprints omits them (zero by default, functionally equivalent). Per R4, construction sites should be consistent. A comment // ScheduledStepIdx, FinishedStepIdx default to 0 or adding the explicit zero would reduce future reader confusion.


Standards Audit (R1–R23, INV-1–INV-11)

Rule Check Result
R1 (no silent data loss) Non-session continue path appends requests ✓; empty sequence returns 1 ✓; session-not-found logs Warn+returns nil ✓
R2 (sort map keys) sessionMap iteration in sort/validate loop doesn't produce output; output controlled by sessionOrder slice (insertion-ordered) ✓
R3 (validate numeric params) replayThinkTimeMs < 0 guarded; replaySessionMode validated ✓
R4 (construction site audit) SessionBlueprint{} zero-value for ThinkTimeSampler correctly falls back (nil check) ✓; ScheduledStepIdx omission noted above ⚠ minor
R6 (no logrus.Fatal in library) New code in sim/workload/replay.go returns errors ✓
R7 (invariant tests alongside golden) No new golden tests added ✓
R8 (no exported mutable maps) No new exported maps ✓
INV-3 (clock monotonicity) Negative gaps clamped to 0 with comment ✓
INV-6 (determinism) sessionOrder preserves insertion order; per-session RNG seeded from same rng.Int63() sequence ✓
INV-9 (oracle knowledge boundary) MaxOutputLen: len(outputTokens) set from round-0 record (not oracle path) ✓
INV-10 (session causality) arrivalTime = tick + thinkTime where thinkTime >= 0 always ✓
INV-11 (session completeness) MaxRounds = len(rounds) ensures session terminates; all existing terminal paths preserved ✓

Test Quality

Behavioral, non-structural tests: ✅ All 13 tests in the plan are implemented. They assert observable outputs (returned token counts, arrival times, blueprint counts) rather than internal struct layout. The TestSession_ThinkTimeSampler_* tests survive an implementation rewrite.

Test coverage gaps:

  • Negative inter-round gap clamping (see issue getting started instructions #3)
  • BC-12 warning path (no session records in trace)
  • SequenceSampler index overflow behavior (not critical given MaxRounds bounds calls, but int overflow after math.MaxInt calls would panic on s.index%len(s.values) if index wraps to math.MinInt). For production use with MaxRounds capped at trace length this is fine.

API Boundaries and Contracts

LoadTraceV2SessionBlueprints signature is clean and composable. The return of ([]*sim.Request, []SessionBlueprint, error) separates injection concerns cleanly. The SequenceSampler implementing LengthSampler is correct per R13 (the interface already has two implementations: ConstantSampler and now SequenceSampler). The optional ThinkTimeSampler field with nil-fallback is a clean extension point. No new interfaces were introduced unnecessarily.


Summary

Priority Count Items
Important 2 Think-time semantics doc gap, double LoadTraceV2SessionBlueprints call
Minor 3 Missing clamping test, missing BC-12 test, struct alignment
Cosmetic 1 ScheduledStepIdx omission

The PR correctly tackles the parent issue. Recommend merging after addressing the two Important items (especially the think-time documentation — users running the observe→replay pipeline need to know that --think-time-ms override is required to reproduce the actual client think time rather than service-time-inflated gaps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(replay): closed-loop session sequencing for trace replay

2 participants