feat(workload): add --concurrency flag for closed-loop single-turn workload generation#861
Conversation
Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of inference-sim#859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…nager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…currency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: inference-sim#859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness. |
|
Claude finished @namasl's task in 5m 7s —— View job PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generationSummaryThis PR correctly delivers the feature described in #859 — a Important Issues (must address)1. INV-11 documents three terminal states;
|
| Contract | Verified | Test |
|---|---|---|
| BC-1: --concurrency and --rate mutually exclusive | ✓ | TestValidate_ConcurrencyAndRateFraction_Rejects, CLI validation |
| BC-2: Exactly N seeds | ✓ | TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints |
| BC-3: Seeds staggered in [0, think_time) | ✓ | TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals, ZeroThinkTime_AllAtZero |
| BC-4: Follow-up arrival = completion + think_time | ✓ | TestConcurrencyMode_EndToEnd_SessionFollowUps |
| BC-5: Session ID preserved | ✓ | End-to-end test |
| BC-6: Fresh tokens, no accumulation | ✓ | TestSession_NoContextAccumulation_FreshTokens |
| BC-7: --num-requests caps total | ✓ | Budget exhaustion test |
| BC-8: Deterministic | ✓ | TestGenerateWorkload_ConcurrencyClient_Deterministic |
| BC-10: Timeout/drop cancels session | ✓ | TestConcurrencyMode_TimeoutCancelsSession |
| BC-11: Backward compat | ✓ | TestGenerateWorkload_NonConcurrencySpec_Unchanged |
INV compliance:
- INV-6 (determinism): RNG isolation via separate seed offsets (
+7919closed-loop,+10007concurrency) is correct. Determinism test passes. - INV-10 (session causality):
arrivalTime = tick + bp.ThinkTimeUsis exact. ✓ - INV-11 (session completeness): All paths reach a terminal state — but
budget_exhaustedneeds to be documented (issue Init commit #1 above).
Budget counting logic (session.go:136-140, 199-201): The counter increments AFTER generating the follow-up and checks BEFORE generating. This is correct — followUpCount >= followUpBudget gates generation cleanly without off-by-one errors. The end-to-end test traces through 4 follow-ups against a budget of 4 correctly.
Stagger formula (generator.go:531-532):
arrivalTime = int64(u) * client.ThinkTimeUs / int64(client.Concurrency)For u=0..N-1, arrivals land in [0, ThinkTimeUs) with integer division. Correct per BC-3.
simulationHorizon defaults to math.MaxInt64 (confirmed at cmd/root.go:234), so the session horizon guard arrivalTime > bp.Horizon will not fire prematurely when --horizon is not set. The concurrency sessions will run until budget exhaustion or explicit horizon. ✓
Separation of Concerns & Modularity
Clean. The workload-layer changes are self-contained:
spec.go: validation rules relaxed correctly for concurrency clientssynthesis.go: refactored to aclientbuilder pattern — clean and readablegenerator.go: concurrency logic appended after closed-loop logic with clear section headersession.go: two orthogonal additions (UnlimitedRoundsflag, follow-up budget) that don't touch existing logic
The DES itself is untouched. The SessionManager contract (single-threaded, DES-invoked) is preserved.
CLI layer (cmd/root.go, cmd/observe_cmd.go): The --concurrency > 0 guard cleanly forks into a new spec-synthesis branch before rejoining the existing pipeline. Auto-setting --max-concurrency in observe mode is a correct UX affordance with an informational log message. ✓
Test Quality Assessment
Non-structural: Tests assert observable behavior (request counts, arrival times, session IDs, token lengths), not internal field types. ✓
Refactor-survivable: Would these tests still pass if SessionBlueprint were renamed or restructured? Yes — they test observable outcomes from GenerateWorkload and sm.OnComplete, not internal field access patterns. ✓
Coverage gaps: No test for mixed closed-loop + concurrency in the same spec. This is the scenario where the budget over-allocation concern (issue #2) would manifest.
Performance Implications
No DES hot path changes. The new code runs at workload-generation time (startup, not per-step). The double call to generatePrefixTokens for mixed workloads is O(prefix_tokens × clients) — negligible. ✓
Documentation Correctness
CLAUDE.mdmentions "Observe/replay/calibrate pipeline" and recent work — no update needed for this featurecmd/flag descriptions are accurate- INV-11 gap (issue Init commit #1 above) is the only documentation correctness issue
Verdict
Approve after addressing issues #1 and #3 (documentation correctness). Issue #2 (mixed workload budget) should be addressed or explicitly documented as out-of-scope in a follow-up. The core implementation is correct, well-tested, and properly isolated.
… RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
All three review issues addressed in 909b879: Issue #1 — INV-11 Issue #2 — Mixed concurrency + multi-turn budget over-allocation: Added a spec validation guard that rejects mixing concurrency clients with multi-turn clients in the same spec. The follow-up budget accounting doesn't support this combination, and a clear error message explains why. Test added: Issue #3 — Misleading RNG comment: Replaced "Reuse prefix tokens from GenerateRequests' RNG stream for consistency" with "Re-derive prefix tokens by initializing a fresh RNG from spec.Seed — same seed produces same prefix tokens as GenerateRequests produced." No merge conflict — branch was already up to date with main. Full test suite passes ( |
…ohort parity) Merge upstream/main (inference-sim#845 normalized exponential arrival sampler, inference-sim#848 cohort field parity) into feat/concurrency-flag. Conflicts in cohort_test.go (excluded fields map — both sides added entries) and generator_test.go (both sides appended new tests at EOF). Resolution: keep both sets of additions. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Merge conflict resolved in 9de1664. Upstream
All tests pass ( |
|
@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness. |
|
Claude finished @namasl's task in 4m 25s —— View job PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generationTasks
SummaryThis is a second-pass review after the first review's three issues (INV-11 docs, misleading RNG comment, mixed workload budget guard) were all resolved in commit 909b879. The core implementation is correct and complete. One new gap in the mixed-workload guard was found; everything else is well done. Important Issue (should address)Mixed-workload guard misses cohorts with
|
| Contract | Status | Evidence |
|---|---|---|
BC-1: --concurrency and --rate mutually exclusive |
✓ | CLI validation + TestValidate_ConcurrencyAndRateFraction_Rejects |
| BC-2: Exactly N seeds | ✓ | TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints |
BC-3: Stagger in [0, think_time) |
✓ | TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals, ZeroThinkTime test |
| BC-4: Follow-up arrival = completion + think_time | ✓ | End-to-end test |
| BC-5: Session ID preserved | ✓ | End-to-end test |
| BC-6: Fresh tokens per round | ✓ | TestSession_NoContextAccumulation_FreshTokens |
BC-7: --num-requests caps total |
✓ | Budget exhaustion test — followUpBudget = maxRequests - len(allReqs) |
| BC-8: Deterministic (INV-6) | ✓ | TestGenerateWorkload_ConcurrencyClient_Deterministic |
| BC-10: Timeout cancels session | ✓ | TestConcurrencyMode_TimeoutCancelsSession |
| BC-11: Non-concurrency backward compat | ✓ | TestGenerateWorkload_NonConcurrencySpec_Unchanged |
INV compliance:
- INV-6 (determinism): Concurrency RNG uses
spec.Seed + 10007to avoid colliding with GenerateRequests (spec.Seed) and closed-loop blueprints (spec.Seed + 7919). ✓ - INV-10 (session causality):
arrivalTime = tick + bp.ThinkTimeUsexactly. ✓ - INV-11 (session completeness): All four terminal states are now documented. Budget counter guard (
followUpCount >= followUpBudget) fires before generation, sosessionBudgetExhaustedis reached cleanly. ✓ - INV-1 (request conservation): Budget formula
maxRequests - len(allReqs)ensures seeds + follow-ups ≤ maxRequests when maxRequests > 0. When maxRequests=0, budget=0 means unlimited — correct for unbounded case. ✓
Budget counting (session.go:136-140, 199-201): Pre-check with >=, post-increment — exactly followUpBudget follow-ups generated. Off-by-one is correct and the end-to-end test traces this completely.
Stagger formula (generator.go:531-532):
arrivalTime = int64(u) * client.ThinkTimeUs / int64(client.Concurrency)For u=0..N-1: arrivals ∈ [0, ThinkTimeUs) with integer division. Correct per BC-3. Single-user (N=1) gets t=0, also correct.
Merge sort determinism: sort.SliceStable on allReqs := append(round0Only, concurrencySeeds...) — equal-time elements preserve insertion order (closed-loop before concurrency), making the merged result deterministic for any tie in arrival time. ✓
Test Quality
Non-structural: All tests assert observable behavior — request counts, arrival times, session IDs, token lengths, budget exhaustion. No internal field access patterns tested.
Refactor-survivable: Tests call GenerateWorkload, NewSessionManager, sm.OnComplete via their public API. A full rename/restructure of internals would not break them.
Coverage is thorough: negative validation, mutual exclusion, mixed-workload rejection, stagger, determinism, metadata fields, blueprint properties, budget exhaustion step-by-step, timeout cancellation, backward compat, synthesis modes, prefix propagation.
One minor coverage gap: No explicit test for SetFollowUpBudget(0) (or omitting it) meaning unlimited. TestSession_UnlimitedRounds_ContinuesPastMaxRounds implicitly covers it (no budget set → no limit), but the invariant isn't stated explicitly. Not blocking.
Separation of Concerns & Modularity
Clean layering maintained throughout:
spec.go: validation rules only — closed under the new fieldssynthesis.go: refactored to build aclientfirst, then fork on mode — readablegenerator.go: concurrency logic is a self-contained section appended after closed-loop logic, with a clear section commentsession.go: two orthogonal additions (UnlimitedRounds, follow-up budget) that don't disturb existing session lifecycle logiccmd/: concurrency mode forks before the existing pipeline and rejoins atGenerateWorkload— clean
The DES hot path is untouched.
API Boundaries & Contracts
SessionBlueprint.UnlimitedRounds bool— additive, non-breaking, well-namedGeneratedWorkload.FollowUpBudget int64— zero means unlimited, consistent with the zero-value conventionSessionManager.SetFollowUpBudget(budget int64)— post-construction setter; acceptable given the DES single-threaded guarantee. Would be slightly cleaner as a constructor option, but not worth changingNewSessionManagerpanic forMaxRounds < 1 && !UnlimitedRounds— maintains existing invariant while accommodating the new mode correctly
Standards Adherence (R1-R23, Principles)
- R4 (struct literals in one place):
SessionBlueprintis constructed in two code paths (closed-loop ~lines 456-472, concurrency ~lines 568-584). Fields likeTenantID,SLOClass,Model,Horizon,InputSampler,OutputSampler,RNGare duplicated. No factory mediates them. The blueprints are structurally different (MaxRoundsvsUnlimitedRounds) which justifies two sites, but it's the type of duplication R4 flags. Acceptable as-is with a comment; not blocking. - R1 (no silent continue): All error paths return errors. ✓
- R3 (input validation): Negative concurrency and negative think_time_us rejected at both CLI and spec layers. ✓
- R8-R10 (exported maps, YAML types, strict parsing): No new exported mutable maps. New fields use value types, not pointers. ✓
- Output channel separation: All diagnostics go to
logrus(stderr). ✓ - Interface design (R13/R14):
SetFollowUpBudgetis a single-concern method. ✓
Documentation
CLAUDE.mdINV-11: Updated to includebudget-exhaustedterminal state. ✓invariants.mdINV-11: Updated with state description, semantics, and verification pointer. ✓- Flag descriptions (
--concurrency,--think-time-ms): accurate in bothblis runandblis observe. ✓ SynthesizeFromDistributiondoc comment explains both modes. ✓GenerateWorkloaddoc comment explains concurrency client handling. ✓
Performance Implications
No DES hot-path changes. All new code runs at workload-generation time (startup). The re-derivation of prefix tokens (generatePrefixTokens called twice for mixed specs — once in GenerateRequests, once in the concurrency section) is O(prefix_tokens × clients) and negligible. ✓
Parent Issue Alignment
The PR delivers exactly what #859 describes: N virtual users, each maintaining exactly one in-flight request, with optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe. ✓
Verdict
Approve after addressing the cohort guard gap. The implementation is correct, well-tested, properly isolated, and the prior review's issues are all resolved. The one remaining gap (concurrency + cohort multi-turn not guarded) is the only substantive issue.
…-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Addressed in beb3132: the mixed-workload guard now also iterates |
|
@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness. |
|
Claude finished @namasl's task in 3m 34s —— View job PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generationTasks
Third-Pass Review SummaryThis is a third-pass review. Both previous rounds of issues have been fully resolved. No new issues found. The PR is ready to merge. All Prior Issues Resolved
Correctness Verification (Final Pass)
Budget calculation is sound for all allowed combinations:
Budget guard in
Stagger formula ( Merge sort determinism ( Auto-set BC Contracts (all verified)
Test QualityNon-structural: Tests assert observable behavior throughout — request counts, arrival times, session IDs, token lengths, budget exhaustion state. No assertions on internal field names or concrete type structure. ✓ Refactor-survivable: Tests exercise public API ( Coverage: Spec validation (6 negative cases + mutual exclusion), session manager (unlimited rounds, budget exhaustion step-by-step, zero MaxRounds, fresh tokens), synthesis (both modes, prefix propagation), generator (seed count, stagger formula, determinism, metadata, blueprint properties, backward compat), end-to-end (full follow-up flow with budget exhaustion, timeout cancellation). Cohort multi-turn guard now has its own test. ✓ Separation of Concerns & ModularityThe DES hot path is untouched. All changes are confined to workload generation (startup time only). The Documentation
VerdictApprove. All issues from the prior two review rounds are fully addressed. Implementation is correct, test coverage is thorough and behavioral, invariants are maintained and documented, and the feature delivers exactly what #859 describes. |
Bug 1 (Critical):
|
Bug 2 (Critical): Seeds bypass
|
| Command | Seeds | FollowUpBudget | SetFollowUpBudget called? | Total |
|---|---|---|---|---|
--concurrency 50 --num-requests 500 |
50 | 450 | Yes, 450 | 500 ✓ |
--concurrency 50 --num-requests 50 |
50 (cap hit after 50) | 0 | Yes, 0 → immediate exhaustion | 50 ✓ |
--concurrency 200 --num-requests 100 |
100 (cap hit after 100) | 0 | Yes, 0 → immediate exhaustion | 100 ✓ |
--concurrency 50 (no --num-requests, with --horizon) |
50 | -1 | No | horizon-bounded ✓ |
Bug 3 (Significant):
|
Bug 4 (Significant):
|
| Command | Behavior |
|---|---|
--concurrency 50 --num-requests 500 |
500 total requests (explicit cap) ✓ |
--concurrency 50 --horizon 60_000_000 |
Runs until 60 s horizon, unlimited requests ✓ |
--concurrency 50 (no --num-requests, no --horizon) |
Existing guard fires: "Workload requires either num_requests or --horizon to bound generation" — clear, correct error ✓ |
--concurrency 50 --num-requests 500 --horizon 60_000_000 |
Stops at whichever limit is hit first ✓ |
Note: observe_cmd.go already behaves correctly here because observeNumRequests defaults to 0 (not 100), so --concurrency + --horizon on blis observe already works as expected.
Bug 5 (Minor):
|
Minor: Magic seed offset
|
Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Re Bug 1 (compose): Fixed in 2196a6f. |
|
Re Bug 2 (seed cap + budget sentinel): Fixed in 2196a6f. Two coupled changes:
|
|
Re Bug 3 (arrival validation): Fixed in 2196a6f. |
|
Re Bug 4 (default --num-requests caps concurrency): Fixed in 2196a6f. In the concurrency branch of |
|
Re Bug 5 (--concurrency silently ignored with --workload-spec): Fixed in 2196a6f. Both |
|
Re seed offset comment: Fixed in 2196a6f. Added explanatory comment documenting why |
|
@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness. |
|
Claude finished @sriumcp's task in 4m 8s —— View job PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generationTasks
Fourth-Pass Review SummaryAll five bugs and the seed-offset comment from Round 3 are correctly addressed in All Prior Issues Resolved
Minor Issue:
|
| Scenario | FollowUpBudget |
SetFollowUpBudget called? |
Behavior |
|---|---|---|---|
--concurrency 50 --num-requests 500 |
450 | Yes (450) | 500 total ✓ |
--concurrency 50 --num-requests 50 |
0 | Yes (0) → budgetEnabled=true, immediate exhaustion |
50 total ✓ |
--concurrency 200 --num-requests 100 |
0 | Yes (0) | 100 seeds (capped), 0 follow-ups ✓ |
--concurrency 50 --horizon 60s |
-1 | No | horizon-bounded ✓ |
| No sessions (rate-only) | 0 (zero value) | No — len(Sessions)==0 guards |
N/A ✓ |
Seed cap (generator.go:557-561): The inner-loop break fires before concurrencyRNG.Int63() draws, which is correct — uncapped users don't consume RNG entropy. The totalConcurrencyUsers += client.Concurrency at line 624 uses the full concurrency count, but this is a truthy check only (the budget formula uses len(allReqs) for the actual count). ✓
budgetEnabled guard (session.go:141, 203-205): SetFollowUpBudget(0) sets budgetEnabled=true, so the pre-generation check sm.budgetEnabled && sm.followUpCount >= sm.followUpBudget fires immediately (0 >= 0). The post-generation increment is likewise gated on sm.budgetEnabled. Off-by-one confirmed correct: check-before, increment-after. ✓
All BC contracts: Verified in Rounds 1-3, no regression in 2196a6f changes.
Test Quality (Final Pass)
New tests added in 2196a6f:
TestGenerateWorkload_ConcurrencyClient_SeedsCappedByMaxRequests— 200 users, cap=100, verifies 100 seeds andFollowUpBudget=0✓TestGenerateWorkload_ConcurrencyClient_ZeroBudgetMeansNoFollowUps—SetFollowUpBudget(0)produces nil follow-up immediately ✓TestValidate_ConcurrencyClient_NoArrivalField_Accepted— concurrency client with empty arrival passes validation ✓TestComposeSpecs_AllConcurrency_MergesClients— two concurrency-only specs compose and passValidate()✓TestValidate_ConcurrencyAndCohortMultiTurn_Rejects— cohort with multi-turn + concurrency client rejected ✓ (Round 3 fix)
All behavioral, non-structural, refactor-survivable. ✓
One test semantic note (informational, not blocking): TestGenerateWorkload_NonConcurrencySpec_Unchanged asserts wl.FollowUpBudget != 0 with the comment "want 0 (no sessions)". The early-return path (generator.go:404) returns FollowUpBudget=0 (Go zero value), which matches 0. Per the documented sentinel convention, 0 means "no follow-ups allowed" but for no-session workloads the field is semantically N/A. The caller guards with len(wl.Sessions) > 0 so this is harmless in practice — the zero value is never consumed for non-session workloads. The test assertion is correct but tests the zero value rather than the intended semantic. Not worth changing.
Separation of Concerns & Modularity
Clean. All five bug fixes in 2196a6f are correctly scoped:
spec.go: arrival guard added tovalidateClient— no behavioral change to rate-based clients ✓generator.go: seed cap check added inside inner loop — existing logic untouched ✓session.go:budgetEnabledfield isolates "unset" from "set to 0" — no interface change ✓root.go/observe_cmd.go:--concurrency×--workload-specfatal,--num-requestsdefaulting fixed — correctly scoped ✓convert.go:totalRate==0path concatenates without scaling — existing scaling path untouched ✓
Standards Adherence
- R4:
SessionBlueprintis still constructed in two sites (closed-loop, concurrency). The structural difference (MaxRoundsvsUnlimitedRounds) justifies two sites; this was flagged and accepted in Round 2. - R1: No silent
continue. All error paths return errors. ✓ - R3: CLI validation for
--concurrency < 0,--think-time-ms < 0, mutual exclusion with--rateand--workload-spec. ✓ - R8-R10: No new exported mutable maps, value types used throughout, strict YAML parsing preserved. ✓
- R2: Map iteration sorted before RNG draws (
sort.Strings(sortedSessionIDs),sort.SliceStable). ✓
Performance Implications
No DES hot-path changes. All new code executes at workload-generation time (startup). ✓
Parent Issue Alignment
The PR delivers exactly what #859 requires: N virtual users, one in-flight request each, optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe. All behavioral contracts are verified with tests. ✓
Verdict
Approve after the one-line help text fix (the "Supports both" → "Supports three modes" update in observe_cmd.go). All prior substantive issues are fully resolved. The implementation is correct, well-tested, properly isolated, and all invariants are maintained and documented.
|
In addition to the one line help text fix above, please also fix the following. Comment 1 — Non-blocking: FollowUpBudget struct comment is wrong Non-blocking: FollowUpBudget struct comment is wrong for the no-session path In sim/workload/generator.go, the GeneratedWorkload struct comment says: // -1 = no cap (either no concurrency clients, or maxRequests was 0/unlimited). But when there are no sessions and no concurrency clients, GenerateWorkload returns early: return &GeneratedWorkload{Requests: reqs}, nil This sets FollowUpBudget to 0 (Go zero value), not -1. The test TestGenerateWorkload_NonConcurrencySpec_Unchanged confirms this by Fix — comment only, no logic change needed: // FollowUpBudget is the cap on follow-up requests for concurrency sessions. Comment 2 — Blocker: ComposeSpecs no longer guards totalRate < 0 Blocker: ComposeSpecs dropped the guard for totalRate < 0 The original convert.go had: if totalRate <= 0 { The new code only handles totalRate == 0 (all-concurrency) and then falls through to renormalization for everything else. If totalRate Fix — add one guard for the negative case: if totalRate == 0 { if totalRate < 0 { // Rate-based or mixed: renormalize (existing code) Add a corresponding test: func TestComposeSpecs_NegativeTotalRate_ReturnsError(t *testing.T) { Note: a spec with AggregateRate < 0 would be rejected by Validate() in normal usage, but ComposeSpecs accepts raw specs and should not |
- Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
All three items from the latest review addressed in 74765be:
Full suite passes. |
|
@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness. |
|
Claude finished @namasl's task in 4m 38s —— View job PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generationTasks
Fifth-Pass Review SummaryAll issues from rounds 1–4 (sriumcp's 5 bugs + minor, plus the observe_cmd help text, FollowUpBudget comment, and negative rate guard) are fully resolved in commits All Prior Issues Resolved
Correctness Verification (Fifth Pass)Stagger formula ( Budget sentinel end-to-end (
Cohort multi-turn guard ( BC Contract Verification (Complete)
INV Compliance (Final)
Test Quality (Final)All tests assert observable behavior — request counts, arrival times, session IDs, token lengths, budget exhaustion state. No assertions on concrete type names or internal fields. Refactor-survivable. Coverage includes:
Standards Adherence
Performance ImplicationsNo DES hot-path changes. All new code executes at workload-generation time (startup). Parent Issue AlignmentPR fully delivers #859: N virtual users, exactly one in-flight request each, optional think time, capped by VerdictApprove. All issues from all prior review rounds are fully addressed. The implementation is correct, test coverage is thorough and behavioral, all invariants are maintained and documented, and the feature precisely delivers what #859 describes. |
…rkload generation (inference-sim#861) * feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of inference-sim#859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add UnlimitedRounds and follow-up budget to SessionManager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add concurrency mode to distribution synthesis Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): generate seed requests and session blueprints for concurrency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis run Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: inference-sim#859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis observe Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test(workload): add end-to-end concurrency mode integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): extend mixed-workload guard to check cohorts for multi-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address 5 bugs + 1 minor from PR review round 2 Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): help text, FollowUpBudget comment, negative rate guard - Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
…rkload generation (#861) * feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of #859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add UnlimitedRounds and follow-up budget to SessionManager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add concurrency mode to distribution synthesis Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): generate seed requests and session blueprints for concurrency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis run Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: #859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis observe Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test(workload): add end-to-end concurrency mode integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): extend mixed-workload guard to check cohorts for multi-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address 5 bugs + 1 minor from PR review round 2 Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): help text, FollowUpBudget comment, negative rate guard - Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
…rkload generation (inference-sim#861) * feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of inference-sim#859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add UnlimitedRounds and follow-up budget to SessionManager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add concurrency mode to distribution synthesis Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): generate seed requests and session blueprints for concurrency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis run Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: inference-sim#859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis observe Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test(workload): add end-to-end concurrency mode integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): extend mixed-workload guard to check cohorts for multi-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address 5 bugs + 1 minor from PR review round 2 Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): help text, FollowUpBudget comment, negative rate guard - Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Summary
--concurrency Nflag toblis runandblis observefor closed-loop single-turn workload generation--think-time-ms), then send the next--rate; reuses existingSessionManagerinfrastructure with newUnlimitedRoundsand shared follow-up budgetCloses #859
Example Usage
Changes
sim/workload/spec.goConcurrencyandThinkTimeUsfields toClientSpec; relax validation for concurrency clientssim/workload/session.goUnlimitedRoundstoSessionBlueprint; add shared follow-up budget toSessionManagersim/workload/synthesis.goConcurrencyandThinkTimeMstoDistributionParams; handle concurrency in synthesissim/workload/generator.goSessionBlueprints for concurrency clients inGenerateWorkloadcmd/root.go--concurrencyand--think-time-msflags toblis runcmd/observe_cmd.go--concurrencyand--think-time-msflags toblis observe; auto-set--max-concurrencyBehavioral Contracts Verified
--concurrencyand--ratemutually exclusive[0, think_time); all at t=0 when think_time=0--num-requestscaps total requests (seeds + follow-ups)Test plan
go test ./...passesgo vet ./...cleanblis run --concurrency 10 --num-requests 50produces expected outputblis run --rate 1 --num-requests 20 --seed 42produces byte-identical output🤖 Generated with Claude Code