feat(workload): add --concurrency flag for closed-loop single-turn workload generation by namasl · Pull Request #861 · inference-sim/inference-sim

namasl · 2026-03-27T20:25:08Z

Summary

Adds --concurrency N flag to blis run and blis observe for closed-loop single-turn workload generation
N virtual users each maintain exactly one in-flight request, wait for completion (plus optional --think-time-ms), then send the next
Mutually exclusive with --rate; reuses existing SessionManager infrastructure with new UnlimitedRounds and shared follow-up budget

Closes #859

Example Usage

# 50 concurrent users, 500 total requests, back-to-back
blis run --model qwen/qwen3-14b --concurrency 50 --num-requests 500

# With think time between response and next request
blis run --model qwen/qwen3-14b --concurrency 50 --num-requests 500 --think-time-ms 200

# Observe mode
blis observe --server-url http://localhost:8000 --model qwen/qwen3-14b \
  --concurrency 50 --num-requests 500 --think-time-ms 200 \
  --trace-header trace.yaml --trace-data trace.csv

Changes

File	What
`sim/workload/spec.go`	Add `Concurrency` and `ThinkTimeUs` fields to `ClientSpec`; relax validation for concurrency clients
`sim/workload/session.go`	Add `UnlimitedRounds` to `SessionBlueprint`; add shared follow-up budget to `SessionManager`
`sim/workload/synthesis.go`	Add `Concurrency` and `ThinkTimeMs` to `DistributionParams`; handle concurrency in synthesis
`sim/workload/generator.go`	Generate seed requests + `SessionBlueprint`s for concurrency clients in `GenerateWorkload`
`cmd/root.go`	Add `--concurrency` and `--think-time-ms` flags to `blis run`
`cmd/observe_cmd.go`	Add `--concurrency` and `--think-time-ms` flags to `blis observe`; auto-set `--max-concurrency`

Behavioral Contracts Verified

BC-1: --concurrency and --rate mutually exclusive
BC-2: Exactly N seed requests per concurrency client
BC-3: Seeds staggered within [0, think_time); all at t=0 when think_time=0
BC-4: Follow-up arrival = completion_time + think_time
BC-5: At most 1 in-flight request per user (session continuity)
BC-6: Fresh input/output per follow-up (no context accumulation)
BC-7: --num-requests caps total requests (seeds + follow-ups)
BC-8: Deterministic with same seed (INV-6)
BC-9: Every user reaches exactly one terminal state (INV-11)
BC-10: Timeout/drop cancels user's session
BC-11: Existing non-concurrency workloads produce byte-identical output

Test plan

5 spec validation tests (concurrency acceptance, mutual exclusion, negatives, aggregate_rate)
4 session manager tests (unlimited rounds, budget exhaustion, zero MaxRounds, fresh tokens)
2 synthesis tests (concurrency mode, prefix propagation)
7 generator tests (seed count, staggering, determinism, metadata, blueprint properties, backward compat)
2 end-to-end integration tests (full follow-up flow with budget, timeout cancellation)
Full go test ./... passes
go vet ./... clean
Smoke test: blis run --concurrency 10 --num-requests 50 produces expected output
Backward compatibility: blis run --rate 1 --num-requests 20 --seed 42 produces byte-identical output

🤖 Generated with Claude Code

Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of inference-sim#859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…nager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…currency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: inference-sim#859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

namasl · 2026-03-27T20:47:11Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-03-27T20:47:37Z

Claude finished @namasl's task in 5m 7s —— View job

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Summary

This PR correctly delivers the feature described in #859 — a --concurrency N closed-loop mode where N virtual users each maintain exactly one in-flight request. The design is clean, the behavioral contracts are well-specified, and test coverage is strong. Three issues need addressing before merge.

Important Issues (must address)

1. INV-11 documents three terminal states; `budget_exhausted` is a new fourth

invariants.md INV-11 explicitly states: "Every session reaches exactly one terminal state: completed (all rounds done), cancelled (a round timed out or was dropped), or horizon-interrupted". This PR adds sessionBudgetExhausted as a fourth terminal state in session.go:24:

sessionBudgetExhausted sessionState = "budget_exhausted"

INV-11 needs to be updated to document this state, its semantics (global request cap reached), and how it preserves the invariant. Without this, the invariant doc contradicts the implementation.

Fix this →

2. Mixed closed-loop + concurrency workload: budget over-allocated

In generator.go:602-608, the follow-up budget is computed as:

followUpBudget = maxRequests - int64(len(allReqs))

allReqs contains round-0 seeds from ALL clients (closed-loop + concurrency + rate-based), but closed-loop multi-turn follow-ups (up to MaxRounds-1 per session) are not counted. In a mixed workload, the budget allocated to concurrency sessions could allow total requests to exceed maxRequests when closed-loop sessions are also generating follow-ups.

Example: maxRequests=100, 5 closed-loop sessions with MaxRounds=10 (will generate 45 follow-ups), 2 concurrency seeds → budget = 93. Total could reach 2+5 seeds + 45 closed-loop follow-ups + 93 concurrency follow-ups = 145 > 100.

This is fine if mixed workloads are out of scope for this PR — but it should either be documented as a known limitation or protected by a validation error (concurrency + reasoning.multi_turn clients are not supported in the same spec).

3. Misleading comment in `generator.go:495-498`

// Reuse prefix tokens from GenerateRequests' RNG stream for consistency
rng := sim.NewPartitionedRNG(sim.NewSimulationKey(spec.Seed))
workloadRNG := rng.ForSubsystem(sim.SubsystemWorkloadGen)
prefixes := generatePrefixTokens(allClients, workloadRNG)

This does not reuse the same RNG state from GenerateRequests — it creates a fresh RNG from the same seed, which produces the same prefix tokens by re-deriving them. The comment implies the original RNG stream is being continued, which is false. Correct comment: "Re-derive prefix tokens by initializing a fresh RNG from spec.Seed — same seed ⟹ same prefix tokens as GenerateRequests produced."

This distinction matters for correctness reviewers and future maintainers who might assume the streams are shared/stateful.

Fix this →

Minor Issues (suggestions)

4. `generatePrefixTokens` called twice for mixed workloads

GenerateRequests calls generatePrefixTokens internally, and the concurrency section calls it again. For pure concurrency workloads this is a no-op (rate-based clients skip concurrency logic), but for mixed workloads both calls execute. It's correct but wasteful. Not a blocking issue.

5. `SessionBlueprint` constructed in two places (R4 concern)

generator.go now constructs SessionBlueprint literals in two distinct code paths: the closed-loop path (lines ~456-472) and the concurrency path (lines ~568-584). R4 says "struct literals in exactly one place." These aren't identical (different fields set — MaxRounds vs UnlimitedRounds), but the duplication of fields like TenantID, SLOClass, Model, Horizon, InputSampler, OutputSampler, RNG is worth a comment explaining why a factory was not used.

Correctness Verification

BC contracts (all verified):

Contract	Verified	Test
BC-1: --concurrency and --rate mutually exclusive	✓	`TestValidate_ConcurrencyAndRateFraction_Rejects`, CLI validation
BC-2: Exactly N seeds	✓	`TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints`
BC-3: Seeds staggered in [0, think_time)	✓	`TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals`, `ZeroThinkTime_AllAtZero`
BC-4: Follow-up arrival = completion + think_time	✓	`TestConcurrencyMode_EndToEnd_SessionFollowUps`
BC-5: Session ID preserved	✓	End-to-end test
BC-6: Fresh tokens, no accumulation	✓	`TestSession_NoContextAccumulation_FreshTokens`
BC-7: --num-requests caps total	✓	Budget exhaustion test
BC-8: Deterministic	✓	`TestGenerateWorkload_ConcurrencyClient_Deterministic`
BC-10: Timeout/drop cancels session	✓	`TestConcurrencyMode_TimeoutCancelsSession`
BC-11: Backward compat	✓	`TestGenerateWorkload_NonConcurrencySpec_Unchanged`

INV compliance:

INV-6 (determinism): RNG isolation via separate seed offsets (+7919 closed-loop, +10007 concurrency) is correct. Determinism test passes.
INV-10 (session causality): arrivalTime = tick + bp.ThinkTimeUs is exact. ✓
INV-11 (session completeness): All paths reach a terminal state — but budget_exhausted needs to be documented (issue Init commit #1 above).

Budget counting logic (session.go:136-140, 199-201): The counter increments AFTER generating the follow-up and checks BEFORE generating. This is correct — followUpCount >= followUpBudget gates generation cleanly without off-by-one errors. The end-to-end test traces through 4 follow-ups against a budget of 4 correctly.

Stagger formula (generator.go:531-532):

arrivalTime = int64(u) * client.ThinkTimeUs / int64(client.Concurrency)

For u=0..N-1, arrivals land in [0, ThinkTimeUs) with integer division. Correct per BC-3.

simulationHorizon defaults to math.MaxInt64 (confirmed at cmd/root.go:234), so the session horizon guard arrivalTime > bp.Horizon will not fire prematurely when --horizon is not set. The concurrency sessions will run until budget exhaustion or explicit horizon. ✓

Separation of Concerns & Modularity

Clean. The workload-layer changes are self-contained:

spec.go: validation rules relaxed correctly for concurrency clients
synthesis.go: refactored to a client builder pattern — clean and readable
generator.go: concurrency logic appended after closed-loop logic with clear section header
session.go: two orthogonal additions (UnlimitedRounds flag, follow-up budget) that don't touch existing logic

The DES itself is untouched. The SessionManager contract (single-threaded, DES-invoked) is preserved.

CLI layer (cmd/root.go, cmd/observe_cmd.go): The --concurrency > 0 guard cleanly forks into a new spec-synthesis branch before rejoining the existing pipeline. Auto-setting --max-concurrency in observe mode is a correct UX affordance with an informational log message. ✓

Test Quality Assessment

Non-structural: Tests assert observable behavior (request counts, arrival times, session IDs, token lengths), not internal field types. ✓

Refactor-survivable: Would these tests still pass if SessionBlueprint were renamed or restructured? Yes — they test observable outcomes from GenerateWorkload and sm.OnComplete, not internal field access patterns. ✓

Coverage gaps: No test for mixed closed-loop + concurrency in the same spec. This is the scenario where the budget over-allocation concern (issue #2) would manifest.

Performance Implications

No DES hot path changes. The new code runs at workload-generation time (startup, not per-step). The double call to generatePrefixTokens for mixed workloads is O(prefix_tokens × clients) — negligible. ✓

Documentation Correctness

CLAUDE.md mentions "Observe/replay/calibrate pipeline" and recent work — no update needed for this feature
cmd/ flag descriptions are accurate
INV-11 gap (issue Init commit #1 above) is the only documentation correctness issue

Verdict

Approve after addressing issues #1 and #3 (documentation correctness). Issue #2 (mixed workload budget) should be addressed or explicitly documented as out-of-scope in a follow-up. The core implementation is correct, well-tested, and properly isolated.

… RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

namasl · 2026-03-27T20:58:56Z

All three review issues addressed in 909b879:

Issue #1 — INV-11 budget_exhausted terminal state: Updated docs/contributing/standards/invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions, including its semantics (global follow-up budget depleted) and verification pointer.

Issue #2 — Mixed concurrency + multi-turn budget over-allocation: Added a spec validation guard that rejects mixing concurrency clients with multi-turn clients in the same spec. The follow-up budget accounting doesn't support this combination, and a clear error message explains why. Test added: TestValidate_ConcurrencyAndMultiTurn_Rejects.

Issue #3 — Misleading RNG comment: Replaced "Reuse prefix tokens from GenerateRequests' RNG stream for consistency" with "Re-derive prefix tokens by initializing a fresh RNG from spec.Seed — same seed produces same prefix tokens as GenerateRequests produced."

No merge conflict — branch was already up to date with main. Full test suite passes (go test ./...).

…ohort parity) Merge upstream/main (inference-sim#845 normalized exponential arrival sampler, inference-sim#848 cohort field parity) into feat/concurrency-flag. Conflicts in cohort_test.go (excluded fields map — both sides added entries) and generator_test.go (both sides appended new tests at EOF). Resolution: keep both sets of additions. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

namasl · 2026-03-27T21:12:15Z

Merge conflict resolved in 9de1664. Upstream main brought in #845 (normalized exponential arrival sampler with CustomSamplerFactory) and #848 (cohort field parity). Two conflicts:

cohort_test.go — excluded fields map: both sides added entries (Concurrency/ThinkTimeUs from this PR, CustomSamplerFactory from upstream). Merged both.
generator_test.go — both sides appended new tests at EOF (concurrency tests from this PR, custom sampler tests from upstream). Kept both sets.

All tests pass (go test ./...). PR is now mergeable.

namasl · 2026-03-27T21:13:43Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-03-27T21:14:06Z

Claude finished @namasl's task in 4m 25s —— View job

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

Summary

This is a second-pass review after the first review's three issues (INV-11 docs, misleading RNG comment, mixed workload budget guard) were all resolved in commit 909b879. The core implementation is correct and complete. One new gap in the mixed-workload guard was found; everything else is well done.

Important Issue (should address)

Mixed-workload guard misses cohorts with `reasoning.multi_turn`

spec.go:272-281 guards against mixing concurrency clients with multi-turn clients:

for _, c := range s.Clients {
    if c.Concurrency > 0 {
        hasConcurrency = true
    }
    if c.Reasoning != nil && c.Reasoning.MultiTurn != nil {
        hasMultiTurn = true
    }
}

This iterates only s.Clients. CohortSpec (line 60) also carries Reasoning *ReasoningSpec, and cohorts are expanded via ExpandCohorts() into ClientSpec entries before session blueprint generation (generator.go:389). A cohort with reasoning.multi_turn combined with concurrency clients in the same spec bypasses the guard entirely, leaving the original over-allocation issue unprotected for that combination.

The fix is to also iterate cohort entries in the mixed-workload check:

for i := range s.Cohorts {
    if s.Cohorts[i].Reasoning != nil && s.Cohorts[i].Reasoning.MultiTurn != nil {
        hasMultiTurn = true
    }
}

Fix this →

Correctness Assessment

BC contracts (all verified):

Contract	Status	Evidence
BC-1: `--concurrency` and `--rate` mutually exclusive	✓	CLI validation + `TestValidate_ConcurrencyAndRateFraction_Rejects`
BC-2: Exactly N seeds	✓	`TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints`
BC-3: Stagger in `[0, think_time)`	✓	`TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals`, `ZeroThinkTime` test
BC-4: Follow-up arrival = completion + think_time	✓	End-to-end test
BC-5: Session ID preserved	✓	End-to-end test
BC-6: Fresh tokens per round	✓	`TestSession_NoContextAccumulation_FreshTokens`
BC-7: `--num-requests` caps total	✓	Budget exhaustion test — `followUpBudget = maxRequests - len(allReqs)`
BC-8: Deterministic (INV-6)	✓	`TestGenerateWorkload_ConcurrencyClient_Deterministic`
BC-10: Timeout cancels session	✓	`TestConcurrencyMode_TimeoutCancelsSession`
BC-11: Non-concurrency backward compat	✓	`TestGenerateWorkload_NonConcurrencySpec_Unchanged`

INV compliance:

INV-6 (determinism): Concurrency RNG uses spec.Seed + 10007 to avoid colliding with GenerateRequests (spec.Seed) and closed-loop blueprints (spec.Seed + 7919). ✓
INV-10 (session causality): arrivalTime = tick + bp.ThinkTimeUs exactly. ✓
INV-11 (session completeness): All four terminal states are now documented. Budget counter guard (followUpCount >= followUpBudget) fires before generation, so sessionBudgetExhausted is reached cleanly. ✓
INV-1 (request conservation): Budget formula maxRequests - len(allReqs) ensures seeds + follow-ups ≤ maxRequests when maxRequests > 0. When maxRequests=0, budget=0 means unlimited — correct for unbounded case. ✓

Budget counting (session.go:136-140, 199-201): Pre-check with >=, post-increment — exactly followUpBudget follow-ups generated. Off-by-one is correct and the end-to-end test traces this completely.

Stagger formula (generator.go:531-532):

arrivalTime = int64(u) * client.ThinkTimeUs / int64(client.Concurrency)

For u=0..N-1: arrivals ∈ [0, ThinkTimeUs) with integer division. Correct per BC-3. Single-user (N=1) gets t=0, also correct.

Merge sort determinism: sort.SliceStable on allReqs := append(round0Only, concurrencySeeds...) — equal-time elements preserve insertion order (closed-loop before concurrency), making the merged result deterministic for any tie in arrival time. ✓

Test Quality

Non-structural: All tests assert observable behavior — request counts, arrival times, session IDs, token lengths, budget exhaustion. No internal field access patterns tested.

Refactor-survivable: Tests call GenerateWorkload, NewSessionManager, sm.OnComplete via their public API. A full rename/restructure of internals would not break them.

Coverage is thorough: negative validation, mutual exclusion, mixed-workload rejection, stagger, determinism, metadata fields, blueprint properties, budget exhaustion step-by-step, timeout cancellation, backward compat, synthesis modes, prefix propagation.

One minor coverage gap: No explicit test for SetFollowUpBudget(0) (or omitting it) meaning unlimited. TestSession_UnlimitedRounds_ContinuesPastMaxRounds implicitly covers it (no budget set → no limit), but the invariant isn't stated explicitly. Not blocking.

Separation of Concerns & Modularity

Clean layering maintained throughout:

spec.go: validation rules only — closed under the new fields
synthesis.go: refactored to build a client first, then fork on mode — readable
generator.go: concurrency logic is a self-contained section appended after closed-loop logic, with a clear section comment
session.go: two orthogonal additions (UnlimitedRounds, follow-up budget) that don't disturb existing session lifecycle logic
cmd/: concurrency mode forks before the existing pipeline and rejoins at GenerateWorkload — clean

The DES hot path is untouched.

API Boundaries & Contracts

SessionBlueprint.UnlimitedRounds bool — additive, non-breaking, well-named
GeneratedWorkload.FollowUpBudget int64 — zero means unlimited, consistent with the zero-value convention
SessionManager.SetFollowUpBudget(budget int64) — post-construction setter; acceptable given the DES single-threaded guarantee. Would be slightly cleaner as a constructor option, but not worth changing
NewSessionManager panic for MaxRounds < 1 && !UnlimitedRounds — maintains existing invariant while accommodating the new mode correctly

Standards Adherence (R1-R23, Principles)

R4 (struct literals in one place): SessionBlueprint is constructed in two code paths (closed-loop ~lines 456-472, concurrency ~lines 568-584). Fields like TenantID, SLOClass, Model, Horizon, InputSampler, OutputSampler, RNG are duplicated. No factory mediates them. The blueprints are structurally different (MaxRounds vs UnlimitedRounds) which justifies two sites, but it's the type of duplication R4 flags. Acceptable as-is with a comment; not blocking.
R1 (no silent continue): All error paths return errors. ✓
R3 (input validation): Negative concurrency and negative think_time_us rejected at both CLI and spec layers. ✓
R8-R10 (exported maps, YAML types, strict parsing): No new exported mutable maps. New fields use value types, not pointers. ✓
Output channel separation: All diagnostics go to logrus (stderr). ✓
Interface design (R13/R14): SetFollowUpBudget is a single-concern method. ✓

Documentation

CLAUDE.md INV-11: Updated to include budget-exhausted terminal state. ✓
invariants.md INV-11: Updated with state description, semantics, and verification pointer. ✓
Flag descriptions (--concurrency, --think-time-ms): accurate in both blis run and blis observe. ✓
SynthesizeFromDistribution doc comment explains both modes. ✓
GenerateWorkload doc comment explains concurrency client handling. ✓

Performance Implications

No DES hot-path changes. All new code runs at workload-generation time (startup). The re-derivation of prefix tokens (generatePrefixTokens called twice for mixed specs — once in GenerateRequests, once in the concurrency section) is O(prefix_tokens × clients) and negligible. ✓

Parent Issue Alignment

The PR delivers exactly what #859 describes: N virtual users, each maintaining exactly one in-flight request, with optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe. ✓

Verdict

Approve after addressing the cohort guard gap. The implementation is correct, well-tested, properly isolated, and the prior review's issues are all resolved. The one remaining gap (concurrency + cohort multi-turn not guarded) is the only substantive issue.

…-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

namasl · 2026-03-27T21:21:36Z

Addressed in beb3132: the mixed-workload guard now also iterates s.Cohorts for Reasoning.MultiTurn, so a cohort with multi-turn combined with concurrency clients is correctly rejected. Test added: TestValidate_ConcurrencyAndCohortMultiTurn_Rejects. Full suite passes.

namasl · 2026-03-27T21:47:53Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-03-27T21:50:24Z

Claude finished @namasl's task in 3m 34s —— View job

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

Third-Pass Review Summary

This is a third-pass review. Both previous rounds of issues have been fully resolved. No new issues found. The PR is ready to merge.

All Prior Issues Resolved

Issue	Commit	Status
INV-11 missing `budget_exhausted` terminal state	`909b879`	✓ Resolved — `invariants.md` updated with state description, semantics, and verification pointer
Misleading RNG comment in `generator.go:495`	`909b879`	✓ Resolved — comment now correctly describes re-derivation, not stream continuation
Mixed concurrency + multi-turn spec not rejected	`909b879`	✓ Resolved — `spec.go` guard added with clear error message
Cohort multi-turn bypass of the guard	`beb3132`	✓ Resolved — `s.Cohorts` loop added; `TestValidate_ConcurrencyAndCohortMultiTurn_Rejects` confirms it

Correctness Verification (Final Pass)

round0Only semantics are correct (generator.go:501-511): The filter keeps all requests for non-session/open-loop clients and only round-0 for closed-loop sessions. The variable name is slightly confusing (it also holds rate-based requests), but the logic is right.

Budget calculation is sound for all allowed combinations:

Pure concurrency: maxRequests - concurrencySeeds = budget for follow-ups. ✓
Rate-based + concurrency (non-multi-turn): rate requests are in allReqs too, so they subtract from the budget, giving concurrency follow-ups the remaining capacity. ✓
Multi-turn + concurrency: blocked by the guard in spec.Validate() before GenerateWorkload is called. ✓
maxRequests=0: budget stays 0 → unlimited sessions run to horizon. ✓

Budget guard in session.go:137 (followUpCount >= followUpBudget) checks before generation, increments after — off-by-one is correct. The end-to-end test TestConcurrencyMode_EndToEnd_SessionFollowUps traces through exactly 4 follow-ups against a budget of 4, then verifies exhaustion. ✓

SetFollowUpBudget only called when FollowUpBudget > 0 (cmd/root.go:1036): When budget is 0 (unlimited), the setter is correctly skipped, leaving the default sm.followUpBudget = 0 in place. ✓

Stagger formula (generator.go:531-532): int64(u) * ThinkTimeUs / int64(N) for u=0..N-1 produces arrivals in [0, ThinkTimeUs). Single-user case (Concurrency=1) gets t=0 unconditionally via the Concurrency > 1 guard. ✓

Merge sort determinism (sort.SliceStable): Equal-time elements preserve insertion order (rate-based before concurrency seeds). Correct and deterministic for INV-6. ✓

Auto-set observeMaxConcur in observe mode: fires only when --max-concurrency not explicitly set, logs informatively, scoped to the variable controlling real-server concurrency. ✓

BC Contracts (all verified)

Contract	Status
BC-1: `--concurrency` and `--rate` mutually exclusive	✓ CLI + spec validation, tested
BC-2: Exactly N seeds per concurrency client	✓ `TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints`
BC-3: Seeds staggered in `[0, think_time)`	✓ `TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals` + ZeroThinkTime case
BC-4: Follow-up arrival = completion + think_time	✓ End-to-end test asserts `ArrivalTime == 15000`
BC-5: Session ID preserved through rounds	✓ End-to-end test
BC-6: Fresh tokens, no context accumulation	✓ `TestSession_NoContextAccumulation_FreshTokens`
BC-7: `--num-requests` caps total	✓ Budget exhaustion test traces to exact cap
BC-8: Deterministic (INV-6)	✓ `TestGenerateWorkload_ConcurrencyClient_Deterministic`
BC-9: Every user reaches terminal state (INV-11)	✓ All paths end in `sessionCompleted`, `sessionCancelled`, `sessionBudgetExhausted`, or `sessionHorizonInterrupted`
BC-10: Timeout/drop cancels session	✓ `TestConcurrencyMode_TimeoutCancelsSession`
BC-11: Non-concurrency backward compat	✓ `TestGenerateWorkload_NonConcurrencySpec_Unchanged`

Test Quality

Non-structural: Tests assert observable behavior throughout — request counts, arrival times, session IDs, token lengths, budget exhaustion state. No assertions on internal field names or concrete type structure. ✓

Refactor-survivable: Tests exercise public API (GenerateWorkload, NewSessionManager, sm.OnComplete, spec.Validate). An internal rename would not break them. ✓

Coverage: Spec validation (6 negative cases + mutual exclusion), session manager (unlimited rounds, budget exhaustion step-by-step, zero MaxRounds, fresh tokens), synthesis (both modes, prefix propagation), generator (seed count, stagger formula, determinism, metadata, blueprint properties, backward compat), end-to-end (full follow-up flow with budget exhaustion, timeout cancellation). Cohort multi-turn guard now has its own test. ✓

Separation of Concerns & Modularity

The DES hot path is untouched. All changes are confined to workload generation (startup time only). The SessionManager contract (single-threaded, DES-invoked) is preserved — SetFollowUpBudget is a pre-simulation configuration call, not a DES-time mutation. ✓

Documentation

CLAUDE.md INV-11, invariants.md INV-11, and flag help strings in both blis run and blis observe are accurate and consistent with the implementation. ✓

Verdict

Approve. All issues from the prior two review rounds are fully addressed. Implementation is correct, test coverage is thorough and behavioral, invariants are maintained and documented, and the feature delivers exactly what #859 describes.

sriumcp · 2026-03-27T21:59:19Z

Bug 1 (Critical): `blis compose` is broken for all-concurrency specs

File: sim/workload/convert.go, ComposeSpecs

A concurrency-only spec always has AggregateRate = 0. Composing any two concurrency-only specs gives totalRate = 0 and hits the existing guard:

if totalRate <= 0 {
    return nil, fmt.Errorf("compose: total aggregate rate must be positive, got %f", totalRate)
}

blis compose --from conc1.yaml --from conc2.yaml always errors with "total aggregate rate must be positive, got 0". No test covers this case.

Correct fix — replace the totalRate <= 0 guard and the conditional scaling loop:

merged.AggregateRate = totalRate

if totalRate == 0 {
    // All specs are concurrency-only (Validate() ensures any spec with a rate-based
    // client requires AggregateRate > 0, so totalRate==0 means all-concurrency).
    // No rate fraction scaling is needed — just concatenate clients.
    for _, s := range specs {
        merged.Clients = append(merged.Clients, s.Clients...)
    }
    return merged, nil
}

// Rate-based or mixed: renormalize each client's RateFraction by its
// spec's proportional share of the combined rate.
for _, s := range specs {
    weight := s.AggregateRate / totalRate
    for _, c := range s.Clients {
        c.RateFraction *= weight
        merged.Clients = append(merged.Clients, c)
    }
}
return merged, nil

This is safe: the mixed case (one concurrency spec + one rate spec) already works because totalRate > 0 from the rate spec, and concurrency clients have RateFraction = 0 so RateFraction *= weight is a no-op for them.

Add a test: TestComposeSpecs_AllConcurrency_MergesClients — compose two specs with concurrency: N clients; verify the merged spec has all clients and passes Validate().

sriumcp · 2026-03-27T21:59:54Z

Bug 2 (Critical): Seeds bypass `maxRequests`; `FollowUpBudget=0` silently means "unlimited" instead of "no follow-ups"

Two coupled defects in sim/workload/generator.go and sim/workload/session.go.

Part A — Seeds are generated without a `maxRequests` cap

In GenerateWorkload, the concurrency seed loop generates one seed per user regardless of maxRequests:

for u := 0; u < client.Concurrency; u++ {
    // No cap check — all N seeds are always generated
    concurrencySeeds = append(concurrencySeeds, seed)
}

With --concurrency 200 --num-requests 100, this produces 200 seeds even though the user asked for 100 total.

Fix: add a cap check at the top of the inner loop body:

for u := 0; u < client.Concurrency; u++ {
    // Never generate more seeds than the global request budget allows.
    if maxRequests > 0 && int64(len(round0Only)+len(concurrencySeeds)) >= maxRequests {
        break
    }
    userSeed := concurrencyRNG.Int63()
    // ... rest of seed generation unchanged

Part B — `FollowUpBudget = 0` conflates "no cap" with "budget fully consumed"

The FollowUpBudget field in GeneratedWorkload uses 0 as both:

The default "no cap set" (unlimited follow-ups), and
The computed value when seeds exhaust the budget (should mean no follow-ups).

With --concurrency 50 --num-requests 50: 50 seeds are generated, followUpBudget = 50 - 50 = 0. In root.go/observe_cmd.go, the check if wl.FollowUpBudget > 0 skips SetFollowUpBudget. The SessionManager then defaults to unlimited follow-ups — the exact opposite of the intended behavior. With the default --horizon = math.MaxInt64, this is an effectively infinite loop.

Correct fix — four coordinated changes:

Step 1 — GeneratedWorkload in generator.go: Use -1 as the "no cap" sentinel.

type GeneratedWorkload struct {
    Requests []*sim.Request
    Sessions []SessionBlueprint
    // FollowUpBudget is the cap on follow-up requests for concurrency sessions.
    // -1 = no cap (either no concurrency clients, or maxRequests was 0/unlimited).
    //  0 = no follow-ups are allowed (seeds consumed the entire request budget).
    // >0 = exactly that many follow-ups are allowed.
    FollowUpBudget int64
}

Step 2 — budget computation in generator.go: Initialize to -1, only set a non-negative value when a cap was actually computed.

// Replace the existing:
var followUpBudget int64
if maxRequests > 0 && totalConcurrencyUsers > 0 {
    followUpBudget = maxRequests - int64(len(allReqs))
    if followUpBudget < 0 {
        followUpBudget = 0
    }
}

// With:
followUpBudget := int64(-1) // -1 = no cap by default
if maxRequests > 0 && totalConcurrencyUsers > 0 {
    budget := maxRequests - int64(len(allReqs))
    if budget < 0 {
        budget = 0
    }
    followUpBudget = budget
}

Step 3 — root.go and observe_cmd.go: Change > 0 to >= 0 so SetFollowUpBudget(0) is called when seeds consumed the entire budget.

// Replace (in both files):
if wl.FollowUpBudget > 0 {
    sessionMgr.SetFollowUpBudget(wl.FollowUpBudget)
}

// With:
if wl.FollowUpBudget >= 0 {
    sessionMgr.SetFollowUpBudget(wl.FollowUpBudget)
}

Step 4 — SessionManager in session.go: Add a budgetEnabled field so that SetFollowUpBudget(0) correctly means "no follow-ups" and the zero-value default (followUpBudget = 0) still means "unlimited".

type SessionManager struct {
    sessions       map[string]*activeSession
    idCounter      int64
    followUpBudget int64
    followUpCount  int64
    budgetEnabled  bool // true once SetFollowUpBudget has been called
}

func (sm *SessionManager) SetFollowUpBudget(budget int64) {
    sm.followUpBudget = budget
    sm.budgetEnabled = true
}

In OnComplete, replace the budget check:

// Replace:
if sm.followUpBudget > 0 && sm.followUpCount >= sm.followUpBudget {

// With:
if sm.budgetEnabled && sm.followUpCount >= sm.followUpBudget {

Replace the follow-up counter increment:

// Replace:
if sm.followUpBudget > 0 {
    sm.followUpCount++
}

// With:
if sm.budgetEnabled {
    sm.followUpCount++
}

Verification table — all four parts together:

Command	Seeds	FollowUpBudget	SetFollowUpBudget called?	Total
`--concurrency 50 --num-requests 500`	50	450	Yes, 450	500 ✓
`--concurrency 50 --num-requests 50`	50 (cap hit after 50)	0	Yes, 0 → immediate exhaustion	50 ✓
`--concurrency 200 --num-requests 100`	100 (cap hit after 100)	0	Yes, 0 → immediate exhaustion	100 ✓
`--concurrency 50` (no `--num-requests`, with `--horizon`)	50	-1	No	horizon-bounded ✓

sriumcp · 2026-03-27T22:00:14Z

Bug 3 (Significant): `Arrival.Process` is validated but completely ignored for concurrency clients

File: sim/workload/spec.go, validateClient

After this PR, validateClient still validates c.Arrival.Process for concurrency clients:

// CustomSamplerFactory bypasses arrival process validation (programmatic injection)
if c.CustomSamplerFactory == nil {
    if !validArrivalProcesses[c.Arrival.Process] {
        return fmt.Errorf("%s: unknown arrival process %q; valid: poisson, gamma, weibull, constant", ...)
    }
    ...
}

However, GenerateWorkload never constructs an arrival sampler for concurrency clients. Seed arrivals are placed at deterministic staggered times (u * ThinkTimeUs / N) — the Arrival field is entirely unused.

A user who writes a YAML workload spec:

clients:
  - id: workers
    concurrency: 50
    input_distribution: ...
    output_distribution: ...

gets a confusing, misleading error:

client[0]: unknown arrival process ""; valid: poisson, gamma, weibull, constant

The user must add arrival: {process: constant} as a meaningless cargo-cult field. The CLI synthesizer hides this by always setting constant, but YAML spec authors will be confused.

Correct fix — add c.Concurrency == 0 to the guard for arrival validation:

// Arrival process and CV only apply to rate-based clients. Concurrency clients
// use staggered fixed seed times and never construct an arrival sampler.
if c.Concurrency == 0 && c.CustomSamplerFactory == nil {
    if !validArrivalProcesses[c.Arrival.Process] {
        return fmt.Errorf("%s: unknown arrival process %q; valid: poisson, gamma, weibull, constant", prefix, c.Arrival.Process)
    }
    if c.Arrival.Process == "weibull" && c.Arrival.CV != nil {
        cv := *c.Arrival.CV
        if cv < 0.01 || cv > 10.4 {
            return fmt.Errorf("%s: weibull CV must be in [0.01, 10.4], got %f", prefix, cv)
        }
    }
}
if c.Concurrency == 0 && c.Arrival.CV != nil {
    if err := validateFinitePositive(prefix+".cv", *c.Arrival.CV); err != nil {
        return err
    }
}

This does not break any existing test — the existing concurrency test specs all set Arrival: ArrivalSpec{Process: "constant"}, which will continue to pass validation (the field is merely no longer required). SynthesizeFromDistribution can also drop the Arrival: ArrivalSpec{Process: "constant"} from its concurrency branch since it would no longer be needed for validation, though keeping it is harmless.

Add a test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted — spec with concurrency: 10 and no arrival field; verify Validate() returns nil.

sriumcp · 2026-03-27T22:00:37Z

Bug 4 (Significant): `--num-requests 100` default silently caps concurrency runs that should be horizon-bounded

File: cmd/root.go, the concurrency > 0 branch

--num-requests has a default of 100. In the concurrency branch, numRequests (= 100 by default) is passed directly to SynthesizeFromDistribution:

} else if concurrency > 0 {
    spec = workload.SynthesizeFromDistribution(workload.DistributionParams{
        Concurrency: concurrency, ThinkTimeMs: thinkTimeMs,
        NumRequests: numRequests,  // ← always 100 unless user explicitly overrides
        ...
    })

A user running:

blis run --model qwen/qwen3-14b --concurrency 50 --horizon 60_000_000

expects to run 50 virtual users for 60 seconds. Instead they get exactly 100 total requests (50 seeds + 50 follow-ups) and the simulation stops, silently, far before the 60-second horizon. There is no warning.

Correct fix — in the concurrency branch, only use numRequests when the flag was explicitly provided; otherwise treat the request count as unbounded (0):

} else if concurrency > 0 {
    // In concurrency mode, --num-requests has no meaningful default.
    // If the user did not explicitly set it, leave it at 0 (unbounded) and
    // require --horizon to bound the run. The existing unbounded-generation
    // guard ("Workload requires either num_requests or --horizon") will fire
    // with a clear message if neither is provided.
    concurrencyNumRequests := 0
    if cmd.Flags().Changed("num-requests") {
        concurrencyNumRequests = numRequests
    }
    spec = workload.SynthesizeFromDistribution(workload.DistributionParams{
        Concurrency:        concurrency,
        ThinkTimeMs:        thinkTimeMs,
        NumRequests:        concurrencyNumRequests,
        PrefixTokens:       prefixTokens,
        PromptTokensMean:   promptTokensMean,
        PromptTokensStdDev: promptTokensStdev,
        PromptTokensMin:    promptTokensMin,
        PromptTokensMax:    promptTokensMax,
        OutputTokensMean:   outputTokensMean,
        OutputTokensStdDev: outputTokensStdev,
        OutputTokensMin:    outputTokensMin,
        OutputTokensMax:    outputTokensMax,
    })
    spec.Seed = seed

Behavior after fix:

Command	Behavior
`--concurrency 50 --num-requests 500`	500 total requests (explicit cap) ✓
`--concurrency 50 --horizon 60_000_000`	Runs until 60 s horizon, unlimited requests ✓
`--concurrency 50` (no `--num-requests`, no `--horizon`)	Existing guard fires: "Workload requires either num_requests or --horizon to bound generation" — clear, correct error ✓
`--concurrency 50 --num-requests 500 --horizon 60_000_000`	Stops at whichever limit is hit first ✓

Note: observe_cmd.go already behaves correctly here because observeNumRequests defaults to 0 (not 100), so --concurrency + --horizon on blis observe already works as expected.

sriumcp · 2026-03-27T22:00:53Z

Bug 5 (Minor): `--concurrency` is silently ignored when `--workload-spec` is also provided

File: cmd/root.go (and cmd/observe_cmd.go)

The spec-file path takes priority over the concurrency > 0 branch:

if workloadSpec != "" {
    // Loads spec from file — --concurrency is never read
} else if concurrency > 0 {
    // This branch never runs when --workload-spec is set
}

A user running:

blis run --workload-spec myspec.yaml --concurrency 50

sees no error, no warning, and no output indicating that --concurrency 50 was dropped. The spec file is used as-is. If myspec.yaml has no concurrency clients, the user gets rate-based workload generation when they expected closed-loop behavior.

Correct fix — add a validation guard at the top of the spec-file branch in root.go:

if workloadSpec != "" {
    if concurrency > 0 {
        logrus.Fatalf("--concurrency cannot be used with --workload-spec; " +
            "define concurrency in the spec file using clients[].concurrency instead")
    }
    // ... existing spec-file loading unchanged

Apply the same guard in observe_cmd.go, immediately after the existing observeWorkloadSpec == "" path branches:

if observeWorkloadSpec != "" {
    if observeConcurrency > 0 {
        logrus.Fatalf("--concurrency cannot be used with --workload-spec; " +
            "define concurrency in the spec file using clients[].concurrency instead")
    }
    // ... existing spec-file loading unchanged

Making this a fatal error (rather than a warning) is intentional: silently ignoring a user-provided flag produces harder-to-debug behavior than a clear startup error.

sriumcp · 2026-03-27T22:01:07Z

Minor: Magic seed offset `+10007` needs a comment explaining its relationship to `+7919`

File: sim/workload/generator.go

The PR introduces a second magic-number seed offset:

concurrencyRNG := rand.New(rand.NewSource(spec.Seed + 10007))

The existing closed-loop blueprint RNG (a few lines above) uses:

blueprintRNG := rand.New(rand.NewSource(spec.Seed + 7919))

Both offsets work correctly today because they differ. But there is no comment explaining why they must differ or what would break if a future contributor accidentally reused an offset. Given the pattern of adding new per-client RNG streams as the codebase grows, this is exactly the kind of implicit assumption that causes subtle, hard-to-reproduce non-determinism bugs.

Fix — add an explanatory comment:

// concurrencyRNG drives per-user seed selection and blueprint RNG seeding.
// Uses spec.Seed + 10007, distinct from blueprintRNG's spec.Seed + 7919 above,
// so the two streams do not produce identical sequences for the same spec seed.
// If new per-client RNG streams are added here, choose an offset not already
// in use in this function and document it with the same pattern.
concurrencyRNG := rand.New(rand.NewSource(spec.Seed + 10007))

Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

namasl · 2026-03-27T22:12:05Z

Re Bug 1 (compose): Fixed in 2196a6f. ComposeSpecs now handles totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. The mixed case (one concurrency spec + one rate spec) also works correctly since totalRate > 0 from the rate spec, and RateFraction *= weight is a no-op for concurrency clients. Test added: TestComposeSpecs_AllConcurrency_MergesClients.

namasl · 2026-03-27T22:12:14Z

Re Bug 2 (seed cap + budget sentinel): Fixed in 2196a6f. Two coupled changes:

Part A: Seed generation loop now checks maxRequests before each seed. --concurrency 200 --num-requests 100 correctly produces 100 seeds. Test: TestGenerateWorkload_ConcurrencyClient_SeedsCappedByMaxRequests.
Part B: FollowUpBudget now uses -1 as "no cap" sentinel. SessionManager gains a budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" (not "unlimited"). Callers use >= 0 check. --concurrency 50 --num-requests 50 now correctly produces 50 seeds with zero follow-ups. Test: TestGenerateWorkload_ConcurrencyClient_ZeroBudgetMeansNoFollowUps.

namasl · 2026-03-27T22:12:22Z

Re Bug 3 (arrival validation): Fixed in 2196a6f. validateClient now gates arrival process and CV validation behind c.Concurrency == 0, so concurrency clients no longer need a meaningless arrival: {process: constant} field. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted.

namasl · 2026-03-27T22:12:31Z

Re Bug 4 (default --num-requests caps concurrency): Fixed in 2196a6f. In the concurrency branch of root.go, numRequests is now only used when cmd.Flags().Changed("num-requests") is true. Otherwise it defaults to 0 (unbounded), requiring --horizon to bound the run. The existing unbounded-generation guard fires with a clear message if neither is provided. observe_cmd.go was already correct (observeNumRequests defaults to 0).

namasl · 2026-03-27T22:12:38Z

Re Bug 5 (--concurrency silently ignored with --workload-spec): Fixed in 2196a6f. Both root.go and observe_cmd.go now logrus.Fatalf when --concurrency > 0 and a workload spec file is also provided, with a message directing users to clients[].concurrency in the spec file instead.

namasl · 2026-03-27T22:12:45Z

Re seed offset comment: Fixed in 2196a6f. Added explanatory comment documenting why +10007 must differ from +7919, what would break if reused, and the pattern to follow when adding new per-client RNG streams.

sriumcp · 2026-03-27T22:27:01Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-03-27T22:27:25Z

Claude finished @sriumcp's task in 4m 8s —— View job

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

Fourth-Pass Review Summary

All five bugs and the seed-offset comment from Round 3 are correctly addressed in 2196a6f. One minor documentation gap remains; everything else is correct and clean.

All Prior Issues Resolved

Issue	Commit	Status
INV-11 missing `budget_exhausted` terminal state	`909b879`	✓
Misleading RNG comment in `generator.go`	`909b879`	✓
Mixed concurrency + multi-turn spec not rejected	`909b879`	✓
Cohort multi-turn bypass of mixed-workload guard	`beb3132`	✓
Bug 1: `ComposeSpecs` failing for all-concurrency specs	`2196a6f`	✓
Bug 2a: Seeds generated without `maxRequests` cap	`2196a6f`	✓
Bug 2b: `FollowUpBudget=0` ambiguity / `budgetEnabled` sentinel	`2196a6f`	✓
Bug 3: Arrival validation required for concurrency clients	`2196a6f`	✓
Bug 4: `--num-requests` default silently capping horizon-bounded runs	`2196a6f`	✓
Bug 5: `--concurrency` silently ignored with `--workload-spec`	`2196a6f`	✓
Magic seed offset comment	`2196a6f`	✓

Minor Issue: `observe_cmd.go` help text still says "Supports both"

observe_cmd.go:65 (the Long description for blis observe) reads:

Supports both --workload-spec (YAML) and --rate (distribution synthesis) input paths.

After this PR, there are three input paths, not two. A user running blis observe --help gets stale help text with no mention of --concurrency in the description or examples. The examples section also only shows --workload-spec and --rate usage.

Suggested fix:

Supports --workload-spec (YAML), --rate (distribution synthesis), or --concurrency
(closed-loop virtual users) input paths.

And add one --concurrency example matching the PR description.

Fix this →

Correctness Verification (Final Pass)

Budget sentinel mechanics (generator.go:641-649, session.go:86-89, root.go:1051-1053):

The three-value sentinel (-1/0/+N) is implemented correctly end-to-end:

Scenario	`FollowUpBudget`	`SetFollowUpBudget` called?	Behavior
`--concurrency 50 --num-requests 500`	450	Yes (450)	500 total ✓
`--concurrency 50 --num-requests 50`	0	Yes (0) → `budgetEnabled=true`, immediate exhaustion	50 total ✓
`--concurrency 200 --num-requests 100`	0	Yes (0)	100 seeds (capped), 0 follow-ups ✓
`--concurrency 50 --horizon 60s`	-1	No	horizon-bounded ✓
No sessions (rate-only)	0 (zero value)	No — `len(Sessions)==0` guards	N/A ✓

Seed cap (generator.go:557-561): The inner-loop break fires before concurrencyRNG.Int63() draws, which is correct — uncapped users don't consume RNG entropy. The totalConcurrencyUsers += client.Concurrency at line 624 uses the full concurrency count, but this is a truthy check only (the budget formula uses len(allReqs) for the actual count). ✓

budgetEnabled guard (session.go:141, 203-205): SetFollowUpBudget(0) sets budgetEnabled=true, so the pre-generation check sm.budgetEnabled && sm.followUpCount >= sm.followUpBudget fires immediately (0 >= 0). The post-generation increment is likewise gated on sm.budgetEnabled. Off-by-one confirmed correct: check-before, increment-after. ✓

All BC contracts: Verified in Rounds 1-3, no regression in 2196a6f changes.

Test Quality (Final Pass)

New tests added in 2196a6f:

TestGenerateWorkload_ConcurrencyClient_SeedsCappedByMaxRequests — 200 users, cap=100, verifies 100 seeds and FollowUpBudget=0 ✓
TestGenerateWorkload_ConcurrencyClient_ZeroBudgetMeansNoFollowUps — SetFollowUpBudget(0) produces nil follow-up immediately ✓
TestValidate_ConcurrencyClient_NoArrivalField_Accepted — concurrency client with empty arrival passes validation ✓
TestComposeSpecs_AllConcurrency_MergesClients — two concurrency-only specs compose and pass Validate() ✓
TestValidate_ConcurrencyAndCohortMultiTurn_Rejects — cohort with multi-turn + concurrency client rejected ✓ (Round 3 fix)

All behavioral, non-structural, refactor-survivable. ✓

One test semantic note (informational, not blocking): TestGenerateWorkload_NonConcurrencySpec_Unchanged asserts wl.FollowUpBudget != 0 with the comment "want 0 (no sessions)". The early-return path (generator.go:404) returns FollowUpBudget=0 (Go zero value), which matches 0. Per the documented sentinel convention, 0 means "no follow-ups allowed" but for no-session workloads the field is semantically N/A. The caller guards with len(wl.Sessions) > 0 so this is harmless in practice — the zero value is never consumed for non-session workloads. The test assertion is correct but tests the zero value rather than the intended semantic. Not worth changing.

Separation of Concerns & Modularity

Clean. All five bug fixes in 2196a6f are correctly scoped:

spec.go: arrival guard added to validateClient — no behavioral change to rate-based clients ✓
generator.go: seed cap check added inside inner loop — existing logic untouched ✓
session.go: budgetEnabled field isolates "unset" from "set to 0" — no interface change ✓
root.go/observe_cmd.go: --concurrency × --workload-spec fatal, --num-requests defaulting fixed — correctly scoped ✓
convert.go: totalRate==0 path concatenates without scaling — existing scaling path untouched ✓

Standards Adherence

R4: SessionBlueprint is still constructed in two sites (closed-loop, concurrency). The structural difference (MaxRounds vs UnlimitedRounds) justifies two sites; this was flagged and accepted in Round 2.
R1: No silent continue. All error paths return errors. ✓
R3: CLI validation for --concurrency < 0, --think-time-ms < 0, mutual exclusion with --rate and --workload-spec. ✓
R8-R10: No new exported mutable maps, value types used throughout, strict YAML parsing preserved. ✓
R2: Map iteration sorted before RNG draws (sort.Strings(sortedSessionIDs), sort.SliceStable). ✓

Performance Implications

No DES hot-path changes. All new code executes at workload-generation time (startup). ✓

Parent Issue Alignment

The PR delivers exactly what #859 requires: N virtual users, one in-flight request each, optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe. All behavioral contracts are verified with tests. ✓

Verdict

Approve after the one-line help text fix (the "Supports both" → "Supports three modes" update in observe_cmd.go). All prior substantive issues are fully resolved. The implementation is correct, well-tested, properly isolated, and all invariants are maintained and documented.

sriumcp · 2026-03-27T22:37:53Z

In addition to the one line help text fix above, please also fix the following.

Comment 1 — Non-blocking: FollowUpBudget struct comment is wrong

Non-blocking: FollowUpBudget struct comment is wrong for the no-session path

In sim/workload/generator.go, the GeneratedWorkload struct comment says:

// -1 = no cap (either no concurrency clients, or maxRequests was 0/unlimited).

But when there are no sessions and no concurrency clients, GenerateWorkload returns early:

return &GeneratedWorkload{Requests: reqs}, nil

This sets FollowUpBudget to 0 (Go zero value), not -1. The test TestGenerateWorkload_NonConcurrencySpec_Unchanged confirms this by
asserting FollowUpBudget == 0.

Fix — comment only, no logic change needed:

// FollowUpBudget is the cap on follow-up requests for concurrency sessions.
// -1 = no cap (maxRequests was 0/unlimited, or only closed-loop multi-turn clients present).
// 0 = no follow-ups allowed (seeds consumed the entire budget, or no sessions at all).
// >0 = exactly that many follow-ups allowed.

Comment 2 — Blocker: ComposeSpecs no longer guards totalRate < 0

Blocker: ComposeSpecs dropped the guard for totalRate < 0

The original convert.go had:

if totalRate <= 0 {
return nil, fmt.Errorf("compose: total aggregate rate must be positive, got %f", totalRate)
}

The new code only handles totalRate == 0 (all-concurrency) and then falls through to renormalization for everything else. If totalRate
is negative — which happens if any spec in the list has AggregateRate < 0 — the weight s.AggregateRate / totalRate becomes negative,
silently producing clients with negative RateFraction. Those clients then pass validateClient (which only checks RateFraction > 0 for
rate-based clients, not < 0) and corrupt the merged spec.

Fix — add one guard for the negative case:

if totalRate == 0 {
// all-concurrency path (existing code)
...
return merged, nil
}

if totalRate < 0 {
return nil, fmt.Errorf("compose: total aggregate rate must be non-negative, got %f", totalRate)
}

// Rate-based or mixed: renormalize (existing code)
for _, s := range specs {
...
}

Add a corresponding test:

func TestComposeSpecs_NegativeTotalRate_ReturnsError(t *testing.T) {
spec := &WorkloadSpec{
Version: "2", Category: "language",
AggregateRate: -5.0,
Clients: []ClientSpec{{ID: "c1", RateFraction: 1.0, ...}},
}
_, err := ComposeSpecs([]*WorkloadSpec{spec})
if err == nil {
t.Error("expected error for negative aggregate rate")
}
}

Note: a spec with AggregateRate < 0 would be rejected by Validate() in normal usage, but ComposeSpecs accepts raw specs and should not
silently corrupt them.

- Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

namasl · 2026-03-27T22:45:44Z

All three items from the latest review addressed in 74765be:

Help text fix: observe_cmd.go Long description now reads "Supports --workload-spec (YAML), --rate (distribution synthesis), or --concurrency (closed-loop virtual users) input paths." Added a --concurrency example to the examples section.
FollowUpBudget comment fix: Updated the struct doc to clarify that 0 also covers the no-sessions early-return path (Go zero value), not just the budget-consumed case.
Negative rate guard restored: Added totalRate < 0 check back into ComposeSpecs between the == 0 (all-concurrency) and the renormalization paths. A spec with negative AggregateRate would be caught by Validate() in normal usage, but ComposeSpecs accepts raw specs and should not silently corrupt them. Test: TestComposeSpecs_NegativeTotalRate_ReturnsError.

Full suite passes.

namasl · 2026-03-27T22:46:30Z

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

claude · 2026-03-27T22:47:05Z

Claude finished @namasl's task in 4m 38s —— View job

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

Fifth-Pass Review Summary

All issues from rounds 1–4 (sriumcp's 5 bugs + minor, plus the observe_cmd help text, FollowUpBudget comment, and negative rate guard) are fully resolved in commits 909b879, beb3132, 2196a6f, and 74765be. No new issues found. This PR is ready to merge.

All Prior Issues Resolved

Issue	Commit	Status
INV-11 missing `budget_exhausted` terminal state	`909b879`	✓
Misleading RNG re-derivation comment	`909b879`	✓
Mixed concurrency + multi-turn spec not rejected	`909b879`	✓
Cohort multi-turn bypass of mixed-workload guard	`beb3132`	✓
Bug 1: `ComposeSpecs` failing for all-concurrency specs	`2196a6f`	✓
Bug 2a: Seeds generated without `maxRequests` cap	`2196a6f`	✓
Bug 2b: `FollowUpBudget=0` ambiguity / `budgetEnabled` sentinel	`2196a6f`	✓
Bug 3: Arrival validation required for concurrency clients	`2196a6f`	✓
Bug 4: `--num-requests` default silently capping horizon-bounded runs	`2196a6f`	✓
Bug 5: `--concurrency` silently ignored with `--workload-spec`	`2196a6f`	✓
Magic seed offset comment	`2196a6f`	✓
`observe_cmd.go` help text "Supports both" → three modes	`74765be`	✓
`FollowUpBudget` struct comment covering no-sessions zero-value	`74765be`	✓
`ComposeSpecs` dropped `totalRate < 0` guard	`74765be`	✓

Correctness Verification (Fifth Pass)

Stagger formula (generator.go:569-571): client.Concurrency > 1 guard is correct — for N=1, there's nothing to stagger, and the formula 0 * ThinkTimeUs / 1 = 0 would give t=0 anyway. Integer division satisfies BC-3's requirement of arrivals ∈ [0, ThinkTimeUs). ✓

Budget sentinel end-to-end (generator.go:642-649, session.go:86-88, root.go/observe_cmd.go):

Command	`FollowUpBudget`	`SetFollowUpBudget` called?	Outcome
`--concurrency 50 --num-requests 500`	450	Yes (450)	500 total ✓
`--concurrency 50 --num-requests 50`	0	Yes (0), `budgetEnabled=true` → immediate exhaustion	50 total ✓
`--concurrency 200 --num-requests 100`	0	Yes (0)	100 seeds (capped), 0 follow-ups ✓
`--concurrency 50 --horizon 60s`	-1	No	horizon-bounded ✓
Rate-only (no sessions)	0 (zero value)	No — `len(Sessions)==0` guards	N/A ✓

budgetEnabled guard (session.go:141, 203-205): SetFollowUpBudget(0) sets budgetEnabled=true, so the check sm.budgetEnabled && sm.followUpCount >= sm.followUpBudget fires immediately (0 ≥ 0). Off-by-one is correct: check-before, increment-after. ✓

ComposeSpecs concurrency handling (convert.go:136-160):

totalRate == 0: all-concurrency path concatenates clients, returns early — correct
totalRate < 0: guard added in 74765be — correct
Mixed (one concurrency + one rate spec): concurrency clients have RateFraction=0, so 0 *= weight is a no-op — correct ✓

Cohort multi-turn guard (spec.go:281-285): loops s.Cohorts and checks Reasoning.MultiTurn != nil. Cohorts have no Concurrency field (rate-based only), so the hasConcurrency check correctly stays in s.Clients. The asymmetry is correct, not a bug. ✓

BC Contract Verification (Complete)

Contract	Status	Test(s)
BC-1: `--concurrency` and `--rate` mutually exclusive	✓	`TestValidate_ConcurrencyAndRateFraction_Rejects`, CLI guard
BC-2: Exactly N seeds per client	✓	`TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints`
BC-3: Seeds staggered in `[0, think_time)`	✓	`TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals`, `ZeroThinkTime` case
BC-4: Follow-up arrival = completion + think_time	✓	End-to-end test asserts `ArrivalTime == 15000`
BC-5: Session ID preserved through rounds	✓	End-to-end test
BC-6: Fresh tokens, no context accumulation	✓	`TestSession_NoContextAccumulation_FreshTokens`
BC-7: `--num-requests` caps total	✓	Budget exhaustion test; `TestGenerateWorkload_ConcurrencyClient_SeedsCappedByMaxRequests`
BC-8: Deterministic (INV-6)	✓	`TestGenerateWorkload_ConcurrencyClient_Deterministic`
BC-9: Every user reaches terminal state (INV-11)	✓	All four terminal states documented and exercised
BC-10: Timeout/drop cancels session	✓	`TestConcurrencyMode_TimeoutCancelsSession`
BC-11: Non-concurrency backward compat	✓	`TestGenerateWorkload_NonConcurrencySpec_Unchanged`

INV Compliance (Final)

INV-6 (determinism): Partitioned RNG with documented offsets (+7919 closed-loop, +10007 concurrency). TestGenerateWorkload_ConcurrencyClient_Deterministic passes. ✓
INV-10 (session causality): arrivalTime = tick + bp.ThinkTimeUs is exact. ✓
INV-11 (session completeness): All four terminal states (completed, cancelled, horizon-interrupted, budget-exhausted) documented in invariants.md, all exercised by tests. ✓
INV-1 (request conservation): Budget formula maxRequests - len(allReqs) with seed cap ensures seeds + follow-ups ≤ maxRequests. ✓

Test Quality (Final)

All tests assert observable behavior — request counts, arrival times, session IDs, token lengths, budget exhaustion state. No assertions on concrete type names or internal fields. Refactor-survivable. Coverage includes:

6 spec validation negatives (concurrency acceptance, mutual exclusion, negatives, aggregate_rate, mixed workload, cohort multi-turn)
5 session manager tests (unlimited rounds, budget exhaustion step-by-step, SetFollowUpBudget(0), zero MaxRounds, fresh tokens)
2 synthesis tests (concurrency mode, prefix propagation)
9 generator tests (seed count, stagger, determinism, metadata, blueprint properties, backward compat, seeds capped, zero budget)
2 end-to-end integration tests (full follow-up flow, timeout cancellation)
2 compose tests (all-concurrency merge, negative rate error)

Standards Adherence

R1–R3: All error paths return errors; all inputs validated at CLI and spec layers. ✓
R4: SessionBlueprint constructed in two sites (closed-loop/concurrency) — structurally distinct blueprints justify two sites. ✓
R8–R10: No new exported mutable maps; value types used; strict YAML parsing unchanged. ✓
Output channel separation: All diagnostics via logrus (stderr). ✓

Performance Implications

No DES hot-path changes. All new code executes at workload-generation time (startup). generatePrefixTokens called once for concurrency clients (re-derivation from same seed — correct). ✓

Parent Issue Alignment

PR fully delivers #859: N virtual users, exactly one in-flight request each, optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe, backward-compatible. ✓

Verdict

Approve. All issues from all prior review rounds are fully addressed. The implementation is correct, test coverage is thorough and behavioral, all invariants are maintained and documented, and the feature precisely delivers what #859 describes.

…rkload generation (inference-sim#861) * feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of inference-sim#859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add UnlimitedRounds and follow-up budget to SessionManager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add concurrency mode to distribution synthesis Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): generate seed requests and session blueprints for concurrency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis run Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: inference-sim#859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis observe Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test(workload): add end-to-end concurrency mode integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): extend mixed-workload guard to check cohorts for multi-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address 5 bugs + 1 minor from PR review round 2 Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): help text, FollowUpBudget comment, negative rate guard - Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

…rkload generation (#861) * feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of #859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add UnlimitedRounds and follow-up budget to SessionManager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add concurrency mode to distribution synthesis Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): generate seed requests and session blueprints for concurrency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis run Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: #859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis observe Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test(workload): add end-to-end concurrency mode integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): extend mixed-workload guard to check cohorts for multi-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address 5 bugs + 1 minor from PR review round 2 Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): help text, FollowUpBudget comment, negative rate guard - Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

…rkload generation (inference-sim#861) * feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec Add spec-level support for closed-loop concurrency workloads. When Concurrency > 0, the client operates in concurrency mode (mutually exclusive with rate_fraction). ThinkTimeUs controls inter-request delay in concurrency mode. Aggregate_rate is only required when at least one client is rate-based (Concurrency == 0). Part of inference-sim#859. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add UnlimitedRounds and follow-up budget to SessionManager Add UnlimitedRounds field to SessionBlueprint so concurrency sessions can run past MaxRounds until budget/horizon/timeout/drop. Add global follow-up budget tracking to SessionManager with SetFollowUpBudget() and sessionBudgetExhausted terminal state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): add concurrency mode to distribution synthesis Extend DistributionParams with Concurrency and ThinkTimeMs fields and update SynthesizeFromDistribution to produce concurrency-mode specs (Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero AggregateRate/RateFraction) while preserving backward-compatible rate mode for existing callers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(workload): generate seed requests and session blueprints for concurrency clients Add concurrency client handling to GenerateWorkload: for each client with Concurrency > 0, generate N seed requests (one per virtual user) with staggered arrival times and create N unlimited-round SessionBlueprints. Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis run Wire up closed-loop concurrency mode to the CLI: register new flags, validate mutual exclusion with --rate (BC-1), add concurrency synthesis path in the workload if/else chain, and pass FollowUpBudget from GeneratedWorkload to SessionManager. Refs: inference-sim#859 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat(cli): add --concurrency and --think-time-ms flags to blis observe Wire up closed-loop concurrency mode for the observe command, matching the flags already added to blis run. Includes mutual exclusion with --rate, auto-setting of --max-concurrency, and follow-up budget wiring. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test(workload): add end-to-end concurrency mode integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment - Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions - Add spec validation rejecting mixed concurrency + multi-turn clients (follow-up budget accounting doesn't support this combination) - Fix misleading comment in generator.go about prefix token RNG reuse (it re-derives from the same seed, not continuing the original stream) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): extend mixed-workload guard to check cohorts for multi-turn The concurrency + multi-turn validation only iterated s.Clients, missing CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with multi-turn combined with concurrency clients would bypass the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): address 5 bugs + 1 minor from PR review round 2 Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients. Bug 2 (Critical): Two coupled fixes: - Part A: Seed generation loop now caps at maxRequests, preventing --concurrency 200 --num-requests 100 from producing 200 seeds. - Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0). SessionManager gains budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" rather than "unlimited". Callers updated to use >= 0 check. Tests: SeedsCappedByMaxRequests, ZeroBudgetMeansNoFollowUps. Bug 3 (Significant): Arrival process validation skipped for concurrency clients (they use staggered fixed times, not arrival samplers). Users can now omit the arrival field in YAML specs for concurrency clients. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted. Bug 4 (Significant): In concurrency mode on blis run, --num-requests default of 100 is now ignored unless the flag was explicitly set. Without it, the run requires --horizon or explicit --num-requests. Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal error in both blis run and blis observe, instead of silently ignoring the concurrency flag. Minor: Added explanatory comment for the +10007 RNG offset explaining its relationship to +7919 and the need for distinct offsets. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix(workload): help text, FollowUpBudget comment, negative rate guard - Update observe_cmd.go Long description to mention --concurrency as a third input path and add a concurrency example - Fix FollowUpBudget struct comment: 0 also covers the no-sessions early-return path, not just budget-consumed - Restore negative totalRate guard in ComposeSpecs that was dropped when adding the totalRate==0 all-concurrency path Test: TestComposeSpecs_NegativeTotalRate_ReturnsError Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

namasl and others added 7 commits March 27, 2026 20:00

test(workload): add end-to-end concurrency mode integration tests

4c5623c

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

sriumcp mentioned this pull request Mar 27, 2026

fix(cmd): unify distribution synthesis defaults between and #863

Closed

3 tasks

sriumcp merged commit af1a787 into inference-sim:main Mar 27, 2026
5 checks passed

namasl deleted the feat/concurrency-flag branch March 27, 2026 22:58

sriumcp mentioned this pull request Mar 28, 2026

feat(workload): concurrency-driven (closed-loop) workload generation mode #843

Closed

Conversation

namasl commented Mar 27, 2026

Summary

Example Usage

Changes

Behavioral Contracts Verified

Test plan

Uh oh!

namasl commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Summary

Important Issues (must address)

1. INV-11 documents three terminal states; budget_exhausted is a new fourth

2. Mixed closed-loop + concurrency workload: budget over-allocated

3. Misleading comment in generator.go:495-498

Minor Issues (suggestions)

4. generatePrefixTokens called twice for mixed workloads

5. SessionBlueprint constructed in two places (R4 concern)

Correctness Verification

Separation of Concerns & Modularity

Test Quality Assessment

Performance Implications

Documentation Correctness

Verdict

Uh oh!

namasl commented Mar 27, 2026

Uh oh!

namasl commented Mar 27, 2026

Uh oh!

namasl commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

Summary

Important Issue (should address)

Mixed-workload guard misses cohorts with reasoning.multi_turn

Correctness Assessment

Test Quality

Separation of Concerns & Modularity

API Boundaries & Contracts

Standards Adherence (R1-R23, Principles)

Documentation

Performance Implications

Parent Issue Alignment

Verdict

Uh oh!

namasl commented Mar 27, 2026

Uh oh!

namasl commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

Third-Pass Review Summary

All Prior Issues Resolved

Correctness Verification (Final Pass)

BC Contracts (all verified)

Test Quality

Separation of Concerns & Modularity

Documentation

Verdict

Uh oh!

sriumcp commented Mar 27, 2026

Bug 1 (Critical): blis compose is broken for all-concurrency specs

Uh oh!

sriumcp commented Mar 27, 2026

Bug 2 (Critical): Seeds bypass maxRequests; FollowUpBudget=0 silently means "unlimited" instead of "no follow-ups"

Part A — Seeds are generated without a maxRequests cap

Part B — FollowUpBudget = 0 conflates "no cap" with "budget fully consumed"

Uh oh!

sriumcp commented Mar 27, 2026

Bug 3 (Significant): Arrival.Process is validated but completely ignored for concurrency clients

Uh oh!

sriumcp commented Mar 27, 2026

claude bot commented Mar 27, 2026 •

edited

Loading

1. INV-11 documents three terminal states; `budget_exhausted` is a new fourth

3. Misleading comment in `generator.go:495-498`

4. `generatePrefixTokens` called twice for mixed workloads

5. `SessionBlueprint` constructed in two places (R4 concern)

claude bot commented Mar 27, 2026 •

edited

Loading

Mixed-workload guard misses cohorts with `reasoning.multi_turn`

claude bot commented Mar 27, 2026 •

edited

Loading

Bug 1 (Critical): `blis compose` is broken for all-concurrency specs

Bug 2 (Critical): Seeds bypass `maxRequests`; `FollowUpBudget=0` silently means "unlimited" instead of "no follow-ups"

Part A — Seeds are generated without a `maxRequests` cap

Part B — `FollowUpBudget = 0` conflates "no cap" with "budget fully consumed"

Bug 3 (Significant): `Arrival.Process` is validated but completely ignored for concurrency clients

Bug 4 (Significant): `--num-requests 100` default silently caps concurrency runs that should be horizon-bounded

Bug 5 (Minor): `--concurrency` is silently ignored when `--workload-spec` is also provided

Minor: Magic seed offset `+10007` needs a comment explaining its relationship to `+7919`

claude bot commented Mar 27, 2026 •

edited

Loading

Minor Issue: `observe_cmd.go` help text still says "Supports both"

claude bot commented Mar 27, 2026 •

edited

Loading