feat(replay): closed-loop session sequencing for trace replay by idan-friedman · Pull Request #896 · inference-sim/inference-sim

idan-friedman · 2026-03-31T17:46:14Z

Summary

Adds --session-mode closed-loop to blis replay: session follow-up rounds arrive at completion_tick + think_time instead of pre-baked trace timestamps, enabling load-adaptive multi-turn simulation
Adds --think-time-ms to override inter-round think time (0 = derive from trace inter-round arrival gaps)
Implements SequenceSampler — replays trace token counts per-round in order, wraps on exhaustion, safe for empty sequences
Adds ThinkTimeSampler LengthSampler field to SessionBlueprint for per-round think time variation; nil falls back to constant ThinkTimeUs
Implements LoadTraceV2SessionBlueprints which groups trace records by SessionID, builds deterministic blueprints with SequenceSampler for input/output token counts and think times, and injects only round-0 requests for initial DES injection
Wires SessionManager.OnComplete as the onRequestDone callback so follow-up rounds are generated load-adaptively

Test plan

Closes #880

🤖 Generated with Claude Code

- SequenceSampler replays a pre-recorded sequence of values in order, wrapping to the beginning on exhaustion - Safe for empty sequences (returns minimum token count 1) - Implements LengthSampler interface for use in SessionBlueprint Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Add optional ThinkTimeSampler field (LengthSampler) to SessionBlueprint - OnComplete samples from it when set; falls back to constant ThinkTimeUs - Zero-value nil is safe: existing code paths unchanged Co-Authored-By: Claude Opus 4.6 <[email protected]>

…ay (BC-5, BC-6, BC-7) - Groups trace records by SessionID, builds SessionBlueprint per session - Round-0 requests injected upfront; SessionManager drives follow-up rounds - SequenceSamplers replay exact trace token counts per round - Think time derived from inter-round arrival gaps (or overridden) - Validates consecutive RoundIndex; errors on gaps/duplicates - Non-session records returned as regular requests (BC-7) Co-Authored-By: Claude Opus 4.6 <[email protected]>

…losed-loop session replay Add --session-mode (fixed|closed-loop) and --think-time-ms flags to blis replay. Closed-loop mode calls LoadTraceV2SessionBlueprints to build per-session blueprints from the trace, injects only round-0 requests, and wires SessionManager.OnComplete as the onRequestDone callback so follow-up rounds arrive at completion + think time rather than from pre-baked timestamps. Co-Authored-By: Claude Opus 4.6 <[email protected]>

… review - Add explanatory comment for negative-gap clamp in LoadTraceV2SessionBlueprints (non-monotone timestamps clamped to 0 to prevent INV-3 violation in OnComplete) - Update CLAUDE.md with closed-loop replay examples (--session-mode, --think-time-ms) - Add implementation plan doc for PR 880 Co-Authored-By: Claude Opus 4.6 <[email protected]>

sriumcp · 2026-03-31T19:19:41Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness, including user and developer facing docs.

claude · 2026-03-31T19:25:00Z

Claude finished @sriumcp's task in 7m 41s —— View job

PR Review: feat(replay): closed-loop session sequencing for trace replay

Tasks

Overall Assessment

The PR correctly addresses the parent issue (#880). All four stated deliverables are implemented: SequenceSampler, ThinkTimeSampler on SessionBlueprint, LoadTraceV2SessionBlueprints, and the two CLI flags wired through SessionManager.OnComplete. The design respects separation of concerns, invariants INV-6/INV-10/INV-11 are preserved, and all behavioral contracts are tested. No critical correctness bugs were found. One semantic ambiguity in the think-time derivation requires documentation, and two structural concerns should be addressed before merge.

Important Issues

1. Think-time derived from arrival gaps encodes service time — undocumented semantic [Important]

File: sim/workload/replay.go:164–174

gap := rounds[i].ArrivalTimeUs - rounds[i-1].ArrivalTimeUs

When a trace is produced by blis observe --concurrency N --think-time-ms X, the observe orchestrator sets each follow-up's ArrivalTime to wallClock_at_completion + think_time. Therefore:

gap = ArrivalTimeUs[i] - ArrivalTimeUs[i-1]
    = (completion_time[i-1] + think_time) - send_time[i-1]
    = service_time[i-1] + think_time

The gap is not pure think time — it includes the previous round's service time. Replaying without --think-time-ms override will produce sessions spaced service_time + think_time apart instead of think_time apart. Users running the observe → replay pipeline without the override flag will get longer-than-expected inter-round spacing.

The docstring for LoadTraceV2SessionBlueprints (line 82–84) and the --think-time-ms flag help text (line 341) should explicitly state: "trace-derived think time = inter-arrival gap (includes prior-round service time); use --think-time-ms to supply the actual client-side think time." This is especially important since LastChunkTimeUs in the TraceRecord uses Unix timestamps while ArrivalTimeUs uses elapsed-since-start, making direct subtraction unsafe — the current approach is the only viable one with the current schema, but users need to know what they're getting.

Fix this →

2. Double invocation of `LoadTraceV2SessionBlueprints` in cmd/replay.go [Important]

File: cmd/replay.go:98–119

// First call — only used to check len(blueprints)==0 and compute horizon
r0Requests, blueprints, bErr := workload.LoadTraceV2SessionBlueprints(traceData, seed, 0, 0)
...
// Second call — actual work
r0Requests2, blueprints2, bErr2 := workload.LoadTraceV2SessionBlueprints(traceData, seed, thinkTimeUs, replayHorizonPrelim)

The function iterates all records, builds a prefix-token map, groups/sorts/validates per-session, and generates RNG-derived token IDs — twice. This O(n log n) work is unnecessary. The horizon chicken-and-egg problem can be resolved without a full second call:

Option A: Compute the max arrival time from trace records directly in O(n) before calling the function once:

maxR0Arrival := maxRound0Arrival(traceData)  // one-pass scan
horizon := computeHorizon(maxR0Arrival)
blueprints := LoadTraceV2SessionBlueprints(traceData, seed, thinkTimeUs, horizon)

Option B: Split into HasSessions(trace) bool (fast O(n) check) and the full load.

For large traces (tens of thousands of sessions), the current approach processes everything twice.

Fix this →

Minor Issues

3. Missing test: negative inter-round gap clamping [Minor]

File: sim/workload/replay.go:165–170

The code correctly clamps negative gaps to 0 with an INV-3 rationale comment, but there's no test exercising this path. A test with rounds[0].ArrivalTimeUs = 5000, rounds[1].ArrivalTimeUs = 3000 should verify the clamped value is 0 (not negative), preventing regression.

Fix this →

4. Missing test: BC-12 (closed-loop with no session records) [Minor]

File: cmd/replay.go:103–104

The warning path len(blueprints) == 0 is documented in the plan as BC-12 but has no unit test. Since this lives in CLI code the test would be integration-level; at minimum a note in the plan doc that this contract has no automated coverage would be honest.

5. Struct field alignment inconsistency in `SessionBlueprint` [Cosmetic]

File: sim/workload/session.go:35–46

The newly added ThinkTimeSampler field aligns differently from surrounding fields (ThinkTimeUs, Timeout, Horizon vs RNG, ThinkTimeSampler, Prefix). Go gofmt should normalize this; run gofmt -w sim/workload/session.go before merge.

6. `ScheduledStepIdx`/`FinishedStepIdx` omitted in new round-0 construction site [Minor]

File: sim/workload/replay.go:195–217

LoadTraceV2Requests at line 53–54 explicitly sets ScheduledStepIdx: 0 and FinishedStepIdx: 0. The new construction site in LoadTraceV2SessionBlueprints omits them (zero by default, functionally equivalent). Per R4, construction sites should be consistent. A comment // ScheduledStepIdx, FinishedStepIdx default to 0 or adding the explicit zero would reduce future reader confusion.

Standards Audit (R1–R23, INV-1–INV-11)

Rule	Check	Result
R1 (no silent data loss)	Non-session `continue` path appends requests ✓; empty sequence returns 1 ✓; session-not-found logs Warn+returns nil ✓	✓
R2 (sort map keys)	`sessionMap` iteration in sort/validate loop doesn't produce output; output controlled by `sessionOrder` slice (insertion-ordered) ✓	✓
R3 (validate numeric params)	`replayThinkTimeMs < 0` guarded; `replaySessionMode` validated ✓	✓
R4 (construction site audit)	`SessionBlueprint{}` zero-value for `ThinkTimeSampler` correctly falls back (nil check) ✓; `ScheduledStepIdx` omission noted above	⚠ minor
R6 (no logrus.Fatal in library)	New code in `sim/workload/replay.go` returns errors ✓	✓
R7 (invariant tests alongside golden)	No new golden tests added ✓	✓
R8 (no exported mutable maps)	No new exported maps ✓	✓
INV-3 (clock monotonicity)	Negative gaps clamped to 0 with comment ✓	✓
INV-6 (determinism)	`sessionOrder` preserves insertion order; per-session RNG seeded from same `rng.Int63()` sequence ✓	✓
INV-9 (oracle knowledge boundary)	`MaxOutputLen: len(outputTokens)` set from round-0 record (not oracle path) ✓	✓
INV-10 (session causality)	`arrivalTime = tick + thinkTime` where `thinkTime >= 0` always ✓	✓
INV-11 (session completeness)	`MaxRounds = len(rounds)` ensures session terminates; all existing terminal paths preserved ✓	✓

Test Quality

Behavioral, non-structural tests: ✅ All 13 tests in the plan are implemented. They assert observable outputs (returned token counts, arrival times, blueprint counts) rather than internal struct layout. The TestSession_ThinkTimeSampler_* tests survive an implementation rewrite.

Test coverage gaps:

Negative inter-round gap clamping (see issue getting started instructions #3)
BC-12 warning path (no session records in trace)
SequenceSampler index overflow behavior (not critical given MaxRounds bounds calls, but int overflow after math.MaxInt calls would panic on s.index%len(s.values) if index wraps to math.MinInt). For production use with MaxRounds capped at trace length this is fine.

API Boundaries and Contracts

LoadTraceV2SessionBlueprints signature is clean and composable. The return of ([]*sim.Request, []SessionBlueprint, error) separates injection concerns cleanly. The SequenceSampler implementing LengthSampler is correct per R13 (the interface already has two implementations: ConstantSampler and now SequenceSampler). The optional ThinkTimeSampler field with nil-fallback is a clean extension point. No new interfaces were introduced unnecessarily.

Summary

Priority	Count	Items
Important	2	Think-time semantics doc gap, double `LoadTraceV2SessionBlueprints` call
Minor	3	Missing clamping test, missing BC-12 test, struct alignment
Cosmetic	1	`ScheduledStepIdx` omission

The PR correctly tackles the parent issue. Recommend merging after addressing the two Important items (especially the think-time documentation — users running the observe→replay pipeline need to know that --think-time-ms override is required to reproduce the actual client think time rather than service-time-inflated gaps).

idan-friedman and others added 5 commits March 31, 2026 21:54

idan-friedman force-pushed the pr880-replay-session-mode branch from eb5ee7e to e7fc57e Compare March 31, 2026 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(replay): closed-loop session sequencing for trace replay#896

feat(replay): closed-loop session sequencing for trace replay#896
idan-friedman wants to merge 5 commits intoinference-sim:mainfrom
idan-friedman:pr880-replay-session-mode

idan-friedman commented Mar 31, 2026

Uh oh!

sriumcp commented Mar 31, 2026

Uh oh!

claude bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

idan-friedman commented Mar 31, 2026

Summary

Test plan

Uh oh!

sriumcp commented Mar 31, 2026

Uh oh!

claude bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(replay): closed-loop session sequencing for trace replay

Tasks

Overall Assessment

Important Issues

1. Think-time derived from arrival gaps encodes service time — undocumented semantic [Important]

2. Double invocation of LoadTraceV2SessionBlueprints in cmd/replay.go [Important]

Minor Issues

3. Missing test: negative inter-round gap clamping [Minor]

4. Missing test: BC-12 (closed-loop with no session records) [Minor]

5. Struct field alignment inconsistency in SessionBlueprint [Cosmetic]

6. ScheduledStepIdx/FinishedStepIdx omitted in new round-0 construction site [Minor]

Standards Audit (R1–R23, INV-1–INV-11)

Test Quality

API Boundaries and Contracts

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Mar 31, 2026 •

edited

Loading

2. Double invocation of `LoadTraceV2SessionBlueprints` in cmd/replay.go [Important]

5. Struct field alignment inconsistency in `SessionBlueprint` [Cosmetic]

6. `ScheduledStepIdx`/`FinishedStepIdx` omitted in new round-0 construction site [Minor]