Skip to content

feat(workload): add --concurrency flag for closed-loop single-turn workload generation#861

Merged
sriumcp merged 12 commits intoinference-sim:mainfrom
namasl:feat/concurrency-flag
Mar 27, 2026
Merged

feat(workload): add --concurrency flag for closed-loop single-turn workload generation#861
sriumcp merged 12 commits intoinference-sim:mainfrom
namasl:feat/concurrency-flag

Conversation

@namasl
Copy link
Copy Markdown
Contributor

@namasl namasl commented Mar 27, 2026

Summary

  • Adds --concurrency N flag to blis run and blis observe for closed-loop single-turn workload generation
  • N virtual users each maintain exactly one in-flight request, wait for completion (plus optional --think-time-ms), then send the next
  • Mutually exclusive with --rate; reuses existing SessionManager infrastructure with new UnlimitedRounds and shared follow-up budget

Closes #859

Example Usage

# 50 concurrent users, 500 total requests, back-to-back
blis run --model qwen/qwen3-14b --concurrency 50 --num-requests 500

# With think time between response and next request
blis run --model qwen/qwen3-14b --concurrency 50 --num-requests 500 --think-time-ms 200

# Observe mode
blis observe --server-url http://localhost:8000 --model qwen/qwen3-14b \
  --concurrency 50 --num-requests 500 --think-time-ms 200 \
  --trace-header trace.yaml --trace-data trace.csv

Changes

File What
sim/workload/spec.go Add Concurrency and ThinkTimeUs fields to ClientSpec; relax validation for concurrency clients
sim/workload/session.go Add UnlimitedRounds to SessionBlueprint; add shared follow-up budget to SessionManager
sim/workload/synthesis.go Add Concurrency and ThinkTimeMs to DistributionParams; handle concurrency in synthesis
sim/workload/generator.go Generate seed requests + SessionBlueprints for concurrency clients in GenerateWorkload
cmd/root.go Add --concurrency and --think-time-ms flags to blis run
cmd/observe_cmd.go Add --concurrency and --think-time-ms flags to blis observe; auto-set --max-concurrency

Behavioral Contracts Verified

  • BC-1: --concurrency and --rate mutually exclusive
  • BC-2: Exactly N seed requests per concurrency client
  • BC-3: Seeds staggered within [0, think_time); all at t=0 when think_time=0
  • BC-4: Follow-up arrival = completion_time + think_time
  • BC-5: At most 1 in-flight request per user (session continuity)
  • BC-6: Fresh input/output per follow-up (no context accumulation)
  • BC-7: --num-requests caps total requests (seeds + follow-ups)
  • BC-8: Deterministic with same seed (INV-6)
  • BC-9: Every user reaches exactly one terminal state (INV-11)
  • BC-10: Timeout/drop cancels user's session
  • BC-11: Existing non-concurrency workloads produce byte-identical output

Test plan

  • 5 spec validation tests (concurrency acceptance, mutual exclusion, negatives, aggregate_rate)
  • 4 session manager tests (unlimited rounds, budget exhaustion, zero MaxRounds, fresh tokens)
  • 2 synthesis tests (concurrency mode, prefix propagation)
  • 7 generator tests (seed count, staggering, determinism, metadata, blueprint properties, backward compat)
  • 2 end-to-end integration tests (full follow-up flow with budget, timeout cancellation)
  • Full go test ./... passes
  • go vet ./... clean
  • Smoke test: blis run --concurrency 10 --num-requests 50 produces expected output
  • Backward compatibility: blis run --rate 1 --num-requests 20 --seed 42 produces byte-identical output

🤖 Generated with Claude Code

namasl and others added 7 commits March 27, 2026 20:00
Add spec-level support for closed-loop concurrency workloads. When
Concurrency > 0, the client operates in concurrency mode (mutually
exclusive with rate_fraction). ThinkTimeUs controls inter-request
delay in concurrency mode. Aggregate_rate is only required when at
least one client is rate-based (Concurrency == 0).

Part of inference-sim#859.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…nager

Add UnlimitedRounds field to SessionBlueprint so concurrency sessions
can run past MaxRounds until budget/horizon/timeout/drop. Add global
follow-up budget tracking to SessionManager with SetFollowUpBudget()
and sessionBudgetExhausted terminal state.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Extend DistributionParams with Concurrency and ThinkTimeMs fields and
update SynthesizeFromDistribution to produce concurrency-mode specs
(Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero
AggregateRate/RateFraction) while preserving backward-compatible rate
mode for existing callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…currency clients

Add concurrency client handling to GenerateWorkload: for each client with
Concurrency > 0, generate N seed requests (one per virtual user) with
staggered arrival times and create N unlimited-round SessionBlueprints.
Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Wire up closed-loop concurrency mode to the CLI: register new flags,
validate mutual exclusion with --rate (BC-1), add concurrency synthesis
path in the workload if/else chain, and pass FollowUpBudget from
GeneratedWorkload to SessionManager.

Refs: inference-sim#859

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Wire up closed-loop concurrency mode for the observe command, matching
the flags already added to blis run. Includes mutual exclusion with
--rate, auto-setting of --max-concurrency, and follow-up budget wiring.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @namasl's task in 5m 7s —— View job


PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Summary

This PR correctly delivers the feature described in #859 — a --concurrency N closed-loop mode where N virtual users each maintain exactly one in-flight request. The design is clean, the behavioral contracts are well-specified, and test coverage is strong. Three issues need addressing before merge.


Important Issues (must address)

1. INV-11 documents three terminal states; budget_exhausted is a new fourth

invariants.md INV-11 explicitly states: "Every session reaches exactly one terminal state: completed (all rounds done), cancelled (a round timed out or was dropped), or horizon-interrupted". This PR adds sessionBudgetExhausted as a fourth terminal state in session.go:24:

sessionBudgetExhausted sessionState = "budget_exhausted"

INV-11 needs to be updated to document this state, its semantics (global request cap reached), and how it preserves the invariant. Without this, the invariant doc contradicts the implementation.

Fix this →


2. Mixed closed-loop + concurrency workload: budget over-allocated

In generator.go:602-608, the follow-up budget is computed as:

followUpBudget = maxRequests - int64(len(allReqs))

allReqs contains round-0 seeds from ALL clients (closed-loop + concurrency + rate-based), but closed-loop multi-turn follow-ups (up to MaxRounds-1 per session) are not counted. In a mixed workload, the budget allocated to concurrency sessions could allow total requests to exceed maxRequests when closed-loop sessions are also generating follow-ups.

Example: maxRequests=100, 5 closed-loop sessions with MaxRounds=10 (will generate 45 follow-ups), 2 concurrency seeds → budget = 93. Total could reach 2+5 seeds + 45 closed-loop follow-ups + 93 concurrency follow-ups = 145 > 100.

This is fine if mixed workloads are out of scope for this PR — but it should either be documented as a known limitation or protected by a validation error (concurrency + reasoning.multi_turn clients are not supported in the same spec).


3. Misleading comment in generator.go:495-498

// Reuse prefix tokens from GenerateRequests' RNG stream for consistency
rng := sim.NewPartitionedRNG(sim.NewSimulationKey(spec.Seed))
workloadRNG := rng.ForSubsystem(sim.SubsystemWorkloadGen)
prefixes := generatePrefixTokens(allClients, workloadRNG)

This does not reuse the same RNG state from GenerateRequests — it creates a fresh RNG from the same seed, which produces the same prefix tokens by re-deriving them. The comment implies the original RNG stream is being continued, which is false. Correct comment: "Re-derive prefix tokens by initializing a fresh RNG from spec.Seed — same seed ⟹ same prefix tokens as GenerateRequests produced."

This distinction matters for correctness reviewers and future maintainers who might assume the streams are shared/stateful.

Fix this →


Minor Issues (suggestions)

4. generatePrefixTokens called twice for mixed workloads

GenerateRequests calls generatePrefixTokens internally, and the concurrency section calls it again. For pure concurrency workloads this is a no-op (rate-based clients skip concurrency logic), but for mixed workloads both calls execute. It's correct but wasteful. Not a blocking issue.

5. SessionBlueprint constructed in two places (R4 concern)

generator.go now constructs SessionBlueprint literals in two distinct code paths: the closed-loop path (lines ~456-472) and the concurrency path (lines ~568-584). R4 says "struct literals in exactly one place." These aren't identical (different fields set — MaxRounds vs UnlimitedRounds), but the duplication of fields like TenantID, SLOClass, Model, Horizon, InputSampler, OutputSampler, RNG is worth a comment explaining why a factory was not used.


Correctness Verification

BC contracts (all verified):

Contract Verified Test
BC-1: --concurrency and --rate mutually exclusive TestValidate_ConcurrencyAndRateFraction_Rejects, CLI validation
BC-2: Exactly N seeds TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints
BC-3: Seeds staggered in [0, think_time) TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals, ZeroThinkTime_AllAtZero
BC-4: Follow-up arrival = completion + think_time TestConcurrencyMode_EndToEnd_SessionFollowUps
BC-5: Session ID preserved End-to-end test
BC-6: Fresh tokens, no accumulation TestSession_NoContextAccumulation_FreshTokens
BC-7: --num-requests caps total Budget exhaustion test
BC-8: Deterministic TestGenerateWorkload_ConcurrencyClient_Deterministic
BC-10: Timeout/drop cancels session TestConcurrencyMode_TimeoutCancelsSession
BC-11: Backward compat TestGenerateWorkload_NonConcurrencySpec_Unchanged

INV compliance:

  • INV-6 (determinism): RNG isolation via separate seed offsets (+7919 closed-loop, +10007 concurrency) is correct. Determinism test passes.
  • INV-10 (session causality): arrivalTime = tick + bp.ThinkTimeUs is exact. ✓
  • INV-11 (session completeness): All paths reach a terminal state — but budget_exhausted needs to be documented (issue Init commit #1 above).

Budget counting logic (session.go:136-140, 199-201): The counter increments AFTER generating the follow-up and checks BEFORE generating. This is correct — followUpCount >= followUpBudget gates generation cleanly without off-by-one errors. The end-to-end test traces through 4 follow-ups against a budget of 4 correctly.

Stagger formula (generator.go:531-532):

arrivalTime = int64(u) * client.ThinkTimeUs / int64(client.Concurrency)

For u=0..N-1, arrivals land in [0, ThinkTimeUs) with integer division. Correct per BC-3.

simulationHorizon defaults to math.MaxInt64 (confirmed at cmd/root.go:234), so the session horizon guard arrivalTime > bp.Horizon will not fire prematurely when --horizon is not set. The concurrency sessions will run until budget exhaustion or explicit horizon. ✓


Separation of Concerns & Modularity

Clean. The workload-layer changes are self-contained:

  • spec.go: validation rules relaxed correctly for concurrency clients
  • synthesis.go: refactored to a client builder pattern — clean and readable
  • generator.go: concurrency logic appended after closed-loop logic with clear section header
  • session.go: two orthogonal additions (UnlimitedRounds flag, follow-up budget) that don't touch existing logic

The DES itself is untouched. The SessionManager contract (single-threaded, DES-invoked) is preserved.

CLI layer (cmd/root.go, cmd/observe_cmd.go): The --concurrency > 0 guard cleanly forks into a new spec-synthesis branch before rejoining the existing pipeline. Auto-setting --max-concurrency in observe mode is a correct UX affordance with an informational log message. ✓


Test Quality Assessment

Non-structural: Tests assert observable behavior (request counts, arrival times, session IDs, token lengths), not internal field types. ✓

Refactor-survivable: Would these tests still pass if SessionBlueprint were renamed or restructured? Yes — they test observable outcomes from GenerateWorkload and sm.OnComplete, not internal field access patterns. ✓

Coverage gaps: No test for mixed closed-loop + concurrency in the same spec. This is the scenario where the budget over-allocation concern (issue #2) would manifest.


Performance Implications

No DES hot path changes. The new code runs at workload-generation time (startup, not per-step). The double call to generatePrefixTokens for mixed workloads is O(prefix_tokens × clients) — negligible. ✓


Documentation Correctness

  • CLAUDE.md mentions "Observe/replay/calibrate pipeline" and recent work — no update needed for this feature
  • cmd/ flag descriptions are accurate
  • INV-11 gap (issue Init commit #1 above) is the only documentation correctness issue

Verdict

Approve after addressing issues #1 and #3 (documentation correctness). Issue #2 (mixed workload budget) should be addressed or explicitly documented as out-of-scope in a follow-up. The core implementation is correct, well-tested, and properly isolated.

… RNG comment

- Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted
  as a fourth terminal state for concurrency-mode sessions
- Add spec validation rejecting mixed concurrency + multi-turn clients
  (follow-up budget accounting doesn't support this combination)
- Fix misleading comment in generator.go about prefix token RNG reuse
  (it re-derives from the same seed, not continuing the original stream)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

All three review issues addressed in 909b879:

Issue #1 — INV-11 budget_exhausted terminal state: Updated docs/contributing/standards/invariants.md and CLAUDE.md to document budget_exhausted as a fourth terminal state for concurrency-mode sessions, including its semantics (global follow-up budget depleted) and verification pointer.

Issue #2 — Mixed concurrency + multi-turn budget over-allocation: Added a spec validation guard that rejects mixing concurrency clients with multi-turn clients in the same spec. The follow-up budget accounting doesn't support this combination, and a clear error message explains why. Test added: TestValidate_ConcurrencyAndMultiTurn_Rejects.

Issue #3 — Misleading RNG comment: Replaced "Reuse prefix tokens from GenerateRequests' RNG stream for consistency" with "Re-derive prefix tokens by initializing a fresh RNG from spec.Seed — same seed produces same prefix tokens as GenerateRequests produced."

No merge conflict — branch was already up to date with main. Full test suite passes (go test ./...).

…ohort parity)

Merge upstream/main (inference-sim#845 normalized exponential arrival sampler, inference-sim#848 cohort
field parity) into feat/concurrency-flag. Conflicts in cohort_test.go (excluded
fields map — both sides added entries) and generator_test.go (both sides
appended new tests at EOF). Resolution: keep both sets of additions.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Merge conflict resolved in 9de1664. Upstream main brought in #845 (normalized exponential arrival sampler with CustomSamplerFactory) and #848 (cohort field parity). Two conflicts:

  1. cohort_test.go — excluded fields map: both sides added entries (Concurrency/ThinkTimeUs from this PR, CustomSamplerFactory from upstream). Merged both.
  2. generator_test.go — both sides appended new tests at EOF (concurrency tests from this PR, custom sampler tests from upstream). Kept both sets.

All tests pass (go test ./...). PR is now mergeable.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @namasl's task in 4m 25s —— View job


PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

  • Gather context and read changed files
  • Review correctness & parent issue alignment
  • Review behavioral contracts & invariants
  • Review test quality (behavioral/non-structural)
  • Review separation of concerns & modularity
  • Review API boundaries & contracts
  • Review standards adherence (R1-R23, principles)
  • Review documentation correctness & completeness
  • Review performance implications
  • Synthesize findings

Summary

This is a second-pass review after the first review's three issues (INV-11 docs, misleading RNG comment, mixed workload budget guard) were all resolved in commit 909b879. The core implementation is correct and complete. One new gap in the mixed-workload guard was found; everything else is well done.


Important Issue (should address)

Mixed-workload guard misses cohorts with reasoning.multi_turn

spec.go:272-281 guards against mixing concurrency clients with multi-turn clients:

for _, c := range s.Clients {
    if c.Concurrency > 0 {
        hasConcurrency = true
    }
    if c.Reasoning != nil && c.Reasoning.MultiTurn != nil {
        hasMultiTurn = true
    }
}

This iterates only s.Clients. CohortSpec (line 60) also carries Reasoning *ReasoningSpec, and cohorts are expanded via ExpandCohorts() into ClientSpec entries before session blueprint generation (generator.go:389). A cohort with reasoning.multi_turn combined with concurrency clients in the same spec bypasses the guard entirely, leaving the original over-allocation issue unprotected for that combination.

The fix is to also iterate cohort entries in the mixed-workload check:

for i := range s.Cohorts {
    if s.Cohorts[i].Reasoning != nil && s.Cohorts[i].Reasoning.MultiTurn != nil {
        hasMultiTurn = true
    }
}

Fix this →


Correctness Assessment

BC contracts (all verified):

Contract Status Evidence
BC-1: --concurrency and --rate mutually exclusive CLI validation + TestValidate_ConcurrencyAndRateFraction_Rejects
BC-2: Exactly N seeds TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints
BC-3: Stagger in [0, think_time) TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals, ZeroThinkTime test
BC-4: Follow-up arrival = completion + think_time End-to-end test
BC-5: Session ID preserved End-to-end test
BC-6: Fresh tokens per round TestSession_NoContextAccumulation_FreshTokens
BC-7: --num-requests caps total Budget exhaustion test — followUpBudget = maxRequests - len(allReqs)
BC-8: Deterministic (INV-6) TestGenerateWorkload_ConcurrencyClient_Deterministic
BC-10: Timeout cancels session TestConcurrencyMode_TimeoutCancelsSession
BC-11: Non-concurrency backward compat TestGenerateWorkload_NonConcurrencySpec_Unchanged

INV compliance:

  • INV-6 (determinism): Concurrency RNG uses spec.Seed + 10007 to avoid colliding with GenerateRequests (spec.Seed) and closed-loop blueprints (spec.Seed + 7919). ✓
  • INV-10 (session causality): arrivalTime = tick + bp.ThinkTimeUs exactly. ✓
  • INV-11 (session completeness): All four terminal states are now documented. Budget counter guard (followUpCount >= followUpBudget) fires before generation, so sessionBudgetExhausted is reached cleanly. ✓
  • INV-1 (request conservation): Budget formula maxRequests - len(allReqs) ensures seeds + follow-ups ≤ maxRequests when maxRequests > 0. When maxRequests=0, budget=0 means unlimited — correct for unbounded case. ✓

Budget counting (session.go:136-140, 199-201): Pre-check with >=, post-increment — exactly followUpBudget follow-ups generated. Off-by-one is correct and the end-to-end test traces this completely.

Stagger formula (generator.go:531-532):

arrivalTime = int64(u) * client.ThinkTimeUs / int64(client.Concurrency)

For u=0..N-1: arrivals ∈ [0, ThinkTimeUs) with integer division. Correct per BC-3. Single-user (N=1) gets t=0, also correct.

Merge sort determinism: sort.SliceStable on allReqs := append(round0Only, concurrencySeeds...) — equal-time elements preserve insertion order (closed-loop before concurrency), making the merged result deterministic for any tie in arrival time. ✓


Test Quality

Non-structural: All tests assert observable behavior — request counts, arrival times, session IDs, token lengths, budget exhaustion. No internal field access patterns tested.

Refactor-survivable: Tests call GenerateWorkload, NewSessionManager, sm.OnComplete via their public API. A full rename/restructure of internals would not break them.

Coverage is thorough: negative validation, mutual exclusion, mixed-workload rejection, stagger, determinism, metadata fields, blueprint properties, budget exhaustion step-by-step, timeout cancellation, backward compat, synthesis modes, prefix propagation.

One minor coverage gap: No explicit test for SetFollowUpBudget(0) (or omitting it) meaning unlimited. TestSession_UnlimitedRounds_ContinuesPastMaxRounds implicitly covers it (no budget set → no limit), but the invariant isn't stated explicitly. Not blocking.


Separation of Concerns & Modularity

Clean layering maintained throughout:

  • spec.go: validation rules only — closed under the new fields
  • synthesis.go: refactored to build a client first, then fork on mode — readable
  • generator.go: concurrency logic is a self-contained section appended after closed-loop logic, with a clear section comment
  • session.go: two orthogonal additions (UnlimitedRounds, follow-up budget) that don't disturb existing session lifecycle logic
  • cmd/: concurrency mode forks before the existing pipeline and rejoins at GenerateWorkload — clean

The DES hot path is untouched.


API Boundaries & Contracts

  • SessionBlueprint.UnlimitedRounds bool — additive, non-breaking, well-named
  • GeneratedWorkload.FollowUpBudget int64 — zero means unlimited, consistent with the zero-value convention
  • SessionManager.SetFollowUpBudget(budget int64) — post-construction setter; acceptable given the DES single-threaded guarantee. Would be slightly cleaner as a constructor option, but not worth changing
  • NewSessionManager panic for MaxRounds < 1 && !UnlimitedRounds — maintains existing invariant while accommodating the new mode correctly

Standards Adherence (R1-R23, Principles)

  • R4 (struct literals in one place): SessionBlueprint is constructed in two code paths (closed-loop ~lines 456-472, concurrency ~lines 568-584). Fields like TenantID, SLOClass, Model, Horizon, InputSampler, OutputSampler, RNG are duplicated. No factory mediates them. The blueprints are structurally different (MaxRounds vs UnlimitedRounds) which justifies two sites, but it's the type of duplication R4 flags. Acceptable as-is with a comment; not blocking.
  • R1 (no silent continue): All error paths return errors. ✓
  • R3 (input validation): Negative concurrency and negative think_time_us rejected at both CLI and spec layers. ✓
  • R8-R10 (exported maps, YAML types, strict parsing): No new exported mutable maps. New fields use value types, not pointers. ✓
  • Output channel separation: All diagnostics go to logrus (stderr). ✓
  • Interface design (R13/R14): SetFollowUpBudget is a single-concern method. ✓

Documentation

  • CLAUDE.md INV-11: Updated to include budget-exhausted terminal state. ✓
  • invariants.md INV-11: Updated with state description, semantics, and verification pointer. ✓
  • Flag descriptions (--concurrency, --think-time-ms): accurate in both blis run and blis observe. ✓
  • SynthesizeFromDistribution doc comment explains both modes. ✓
  • GenerateWorkload doc comment explains concurrency client handling. ✓

Performance Implications

No DES hot-path changes. All new code runs at workload-generation time (startup). The re-derivation of prefix tokens (generatePrefixTokens called twice for mixed specs — once in GenerateRequests, once in the concurrency section) is O(prefix_tokens × clients) and negligible. ✓


Parent Issue Alignment

The PR delivers exactly what #859 describes: N virtual users, each maintaining exactly one in-flight request, with optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe. ✓


Verdict

Approve after addressing the cohort guard gap. The implementation is correct, well-tested, properly isolated, and the prior review's issues are all resolved. The one remaining gap (concurrency + cohort multi-turn not guarded) is the only substantive issue.

…-turn

The concurrency + multi-turn validation only iterated s.Clients, missing
CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with
multi-turn combined with concurrency clients would bypass the guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Addressed in beb3132: the mixed-workload guard now also iterates s.Cohorts for Reasoning.MultiTurn, so a cohort with multi-turn combined with concurrency clients is correctly rejected. Test added: TestValidate_ConcurrencyAndCohortMultiTurn_Rejects. Full suite passes.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @namasl's task in 3m 34s —— View job


PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

  • Post initial comment
  • Read changed files and current branch state
  • Review correctness & parent issue alignment
  • Review behavioral contracts & invariants (including cohort guard fix)
  • Review test quality (behavioral/non-structural)
  • Review separation of concerns & modularity
  • Review API boundaries & contracts
  • Review standards adherence (R1-R23, principles)
  • Review documentation correctness & completeness
  • Review performance implications
  • Synthesize findings

Third-Pass Review Summary

This is a third-pass review. Both previous rounds of issues have been fully resolved. No new issues found. The PR is ready to merge.


All Prior Issues Resolved

Issue Commit Status
INV-11 missing budget_exhausted terminal state 909b879 ✓ Resolved — invariants.md updated with state description, semantics, and verification pointer
Misleading RNG comment in generator.go:495 909b879 ✓ Resolved — comment now correctly describes re-derivation, not stream continuation
Mixed concurrency + multi-turn spec not rejected 909b879 ✓ Resolved — spec.go guard added with clear error message
Cohort multi-turn bypass of the guard beb3132 ✓ Resolved — s.Cohorts loop added; TestValidate_ConcurrencyAndCohortMultiTurn_Rejects confirms it

Correctness Verification (Final Pass)

round0Only semantics are correct (generator.go:501-511): The filter keeps all requests for non-session/open-loop clients and only round-0 for closed-loop sessions. The variable name is slightly confusing (it also holds rate-based requests), but the logic is right.

Budget calculation is sound for all allowed combinations:

  • Pure concurrency: maxRequests - concurrencySeeds = budget for follow-ups. ✓
  • Rate-based + concurrency (non-multi-turn): rate requests are in allReqs too, so they subtract from the budget, giving concurrency follow-ups the remaining capacity. ✓
  • Multi-turn + concurrency: blocked by the guard in spec.Validate() before GenerateWorkload is called. ✓
  • maxRequests=0: budget stays 0 → unlimited sessions run to horizon. ✓

Budget guard in session.go:137 (followUpCount >= followUpBudget) checks before generation, increments after — off-by-one is correct. The end-to-end test TestConcurrencyMode_EndToEnd_SessionFollowUps traces through exactly 4 follow-ups against a budget of 4, then verifies exhaustion. ✓

SetFollowUpBudget only called when FollowUpBudget > 0 (cmd/root.go:1036): When budget is 0 (unlimited), the setter is correctly skipped, leaving the default sm.followUpBudget = 0 in place. ✓

Stagger formula (generator.go:531-532): int64(u) * ThinkTimeUs / int64(N) for u=0..N-1 produces arrivals in [0, ThinkTimeUs). Single-user case (Concurrency=1) gets t=0 unconditionally via the Concurrency > 1 guard. ✓

Merge sort determinism (sort.SliceStable): Equal-time elements preserve insertion order (rate-based before concurrency seeds). Correct and deterministic for INV-6. ✓

Auto-set observeMaxConcur in observe mode: fires only when --max-concurrency not explicitly set, logs informatively, scoped to the variable controlling real-server concurrency. ✓


BC Contracts (all verified)

Contract Status
BC-1: --concurrency and --rate mutually exclusive ✓ CLI + spec validation, tested
BC-2: Exactly N seeds per concurrency client TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints
BC-3: Seeds staggered in [0, think_time) TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals + ZeroThinkTime case
BC-4: Follow-up arrival = completion + think_time ✓ End-to-end test asserts ArrivalTime == 15000
BC-5: Session ID preserved through rounds ✓ End-to-end test
BC-6: Fresh tokens, no context accumulation TestSession_NoContextAccumulation_FreshTokens
BC-7: --num-requests caps total ✓ Budget exhaustion test traces to exact cap
BC-8: Deterministic (INV-6) TestGenerateWorkload_ConcurrencyClient_Deterministic
BC-9: Every user reaches terminal state (INV-11) ✓ All paths end in sessionCompleted, sessionCancelled, sessionBudgetExhausted, or sessionHorizonInterrupted
BC-10: Timeout/drop cancels session TestConcurrencyMode_TimeoutCancelsSession
BC-11: Non-concurrency backward compat TestGenerateWorkload_NonConcurrencySpec_Unchanged

Test Quality

Non-structural: Tests assert observable behavior throughout — request counts, arrival times, session IDs, token lengths, budget exhaustion state. No assertions on internal field names or concrete type structure. ✓

Refactor-survivable: Tests exercise public API (GenerateWorkload, NewSessionManager, sm.OnComplete, spec.Validate). An internal rename would not break them. ✓

Coverage: Spec validation (6 negative cases + mutual exclusion), session manager (unlimited rounds, budget exhaustion step-by-step, zero MaxRounds, fresh tokens), synthesis (both modes, prefix propagation), generator (seed count, stagger formula, determinism, metadata, blueprint properties, backward compat), end-to-end (full follow-up flow with budget exhaustion, timeout cancellation). Cohort multi-turn guard now has its own test. ✓


Separation of Concerns & Modularity

The DES hot path is untouched. All changes are confined to workload generation (startup time only). The SessionManager contract (single-threaded, DES-invoked) is preserved — SetFollowUpBudget is a pre-simulation configuration call, not a DES-time mutation. ✓


Documentation

CLAUDE.md INV-11, invariants.md INV-11, and flag help strings in both blis run and blis observe are accurate and consistent with the implementation. ✓


Verdict

Approve. All issues from the prior two review rounds are fully addressed. Implementation is correct, test coverage is thorough and behavioral, invariants are maintained and documented, and the feature delivers exactly what #859 describes.

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

Bug 1 (Critical): blis compose is broken for all-concurrency specs

File: sim/workload/convert.go, ComposeSpecs

A concurrency-only spec always has AggregateRate = 0. Composing any two concurrency-only specs gives totalRate = 0 and hits the existing guard:

if totalRate <= 0 {
    return nil, fmt.Errorf("compose: total aggregate rate must be positive, got %f", totalRate)
}

blis compose --from conc1.yaml --from conc2.yaml always errors with "total aggregate rate must be positive, got 0". No test covers this case.

Correct fix — replace the totalRate <= 0 guard and the conditional scaling loop:

merged.AggregateRate = totalRate

if totalRate == 0 {
    // All specs are concurrency-only (Validate() ensures any spec with a rate-based
    // client requires AggregateRate > 0, so totalRate==0 means all-concurrency).
    // No rate fraction scaling is needed — just concatenate clients.
    for _, s := range specs {
        merged.Clients = append(merged.Clients, s.Clients...)
    }
    return merged, nil
}

// Rate-based or mixed: renormalize each client's RateFraction by its
// spec's proportional share of the combined rate.
for _, s := range specs {
    weight := s.AggregateRate / totalRate
    for _, c := range s.Clients {
        c.RateFraction *= weight
        merged.Clients = append(merged.Clients, c)
    }
}
return merged, nil

This is safe: the mixed case (one concurrency spec + one rate spec) already works because totalRate > 0 from the rate spec, and concurrency clients have RateFraction = 0 so RateFraction *= weight is a no-op for them.

Add a test: TestComposeSpecs_AllConcurrency_MergesClients — compose two specs with concurrency: N clients; verify the merged spec has all clients and passes Validate().

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

Bug 2 (Critical): Seeds bypass maxRequests; FollowUpBudget=0 silently means "unlimited" instead of "no follow-ups"

Two coupled defects in sim/workload/generator.go and sim/workload/session.go.


Part A — Seeds are generated without a maxRequests cap

In GenerateWorkload, the concurrency seed loop generates one seed per user regardless of maxRequests:

for u := 0; u < client.Concurrency; u++ {
    // No cap check — all N seeds are always generated
    concurrencySeeds = append(concurrencySeeds, seed)
}

With --concurrency 200 --num-requests 100, this produces 200 seeds even though the user asked for 100 total.

Fix: add a cap check at the top of the inner loop body:

for u := 0; u < client.Concurrency; u++ {
    // Never generate more seeds than the global request budget allows.
    if maxRequests > 0 && int64(len(round0Only)+len(concurrencySeeds)) >= maxRequests {
        break
    }
    userSeed := concurrencyRNG.Int63()
    // ... rest of seed generation unchanged

Part B — FollowUpBudget = 0 conflates "no cap" with "budget fully consumed"

The FollowUpBudget field in GeneratedWorkload uses 0 as both:

  • The default "no cap set" (unlimited follow-ups), and
  • The computed value when seeds exhaust the budget (should mean no follow-ups).

With --concurrency 50 --num-requests 50: 50 seeds are generated, followUpBudget = 50 - 50 = 0. In root.go/observe_cmd.go, the check if wl.FollowUpBudget > 0 skips SetFollowUpBudget. The SessionManager then defaults to unlimited follow-ups — the exact opposite of the intended behavior. With the default --horizon = math.MaxInt64, this is an effectively infinite loop.

Correct fix — four coordinated changes:

Step 1 — GeneratedWorkload in generator.go: Use -1 as the "no cap" sentinel.

type GeneratedWorkload struct {
    Requests []*sim.Request
    Sessions []SessionBlueprint
    // FollowUpBudget is the cap on follow-up requests for concurrency sessions.
    // -1 = no cap (either no concurrency clients, or maxRequests was 0/unlimited).
    //  0 = no follow-ups are allowed (seeds consumed the entire request budget).
    // >0 = exactly that many follow-ups are allowed.
    FollowUpBudget int64
}

Step 2 — budget computation in generator.go: Initialize to -1, only set a non-negative value when a cap was actually computed.

// Replace the existing:
var followUpBudget int64
if maxRequests > 0 && totalConcurrencyUsers > 0 {
    followUpBudget = maxRequests - int64(len(allReqs))
    if followUpBudget < 0 {
        followUpBudget = 0
    }
}

// With:
followUpBudget := int64(-1) // -1 = no cap by default
if maxRequests > 0 && totalConcurrencyUsers > 0 {
    budget := maxRequests - int64(len(allReqs))
    if budget < 0 {
        budget = 0
    }
    followUpBudget = budget
}

Step 3 — root.go and observe_cmd.go: Change > 0 to >= 0 so SetFollowUpBudget(0) is called when seeds consumed the entire budget.

// Replace (in both files):
if wl.FollowUpBudget > 0 {
    sessionMgr.SetFollowUpBudget(wl.FollowUpBudget)
}

// With:
if wl.FollowUpBudget >= 0 {
    sessionMgr.SetFollowUpBudget(wl.FollowUpBudget)
}

Step 4 — SessionManager in session.go: Add a budgetEnabled field so that SetFollowUpBudget(0) correctly means "no follow-ups" and the zero-value default (followUpBudget = 0) still means "unlimited".

type SessionManager struct {
    sessions       map[string]*activeSession
    idCounter      int64
    followUpBudget int64
    followUpCount  int64
    budgetEnabled  bool // true once SetFollowUpBudget has been called
}

func (sm *SessionManager) SetFollowUpBudget(budget int64) {
    sm.followUpBudget = budget
    sm.budgetEnabled = true
}

In OnComplete, replace the budget check:

// Replace:
if sm.followUpBudget > 0 && sm.followUpCount >= sm.followUpBudget {

// With:
if sm.budgetEnabled && sm.followUpCount >= sm.followUpBudget {

Replace the follow-up counter increment:

// Replace:
if sm.followUpBudget > 0 {
    sm.followUpCount++
}

// With:
if sm.budgetEnabled {
    sm.followUpCount++
}

Verification table — all four parts together:

Command Seeds FollowUpBudget SetFollowUpBudget called? Total
--concurrency 50 --num-requests 500 50 450 Yes, 450 500 ✓
--concurrency 50 --num-requests 50 50 (cap hit after 50) 0 Yes, 0 → immediate exhaustion 50 ✓
--concurrency 200 --num-requests 100 100 (cap hit after 100) 0 Yes, 0 → immediate exhaustion 100 ✓
--concurrency 50 (no --num-requests, with --horizon) 50 -1 No horizon-bounded ✓

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

Bug 3 (Significant): Arrival.Process is validated but completely ignored for concurrency clients

File: sim/workload/spec.go, validateClient

After this PR, validateClient still validates c.Arrival.Process for concurrency clients:

// CustomSamplerFactory bypasses arrival process validation (programmatic injection)
if c.CustomSamplerFactory == nil {
    if !validArrivalProcesses[c.Arrival.Process] {
        return fmt.Errorf("%s: unknown arrival process %q; valid: poisson, gamma, weibull, constant", ...)
    }
    ...
}

However, GenerateWorkload never constructs an arrival sampler for concurrency clients. Seed arrivals are placed at deterministic staggered times (u * ThinkTimeUs / N) — the Arrival field is entirely unused.

A user who writes a YAML workload spec:

clients:
  - id: workers
    concurrency: 50
    input_distribution: ...
    output_distribution: ...

gets a confusing, misleading error:

client[0]: unknown arrival process ""; valid: poisson, gamma, weibull, constant

The user must add arrival: {process: constant} as a meaningless cargo-cult field. The CLI synthesizer hides this by always setting constant, but YAML spec authors will be confused.

Correct fix — add c.Concurrency == 0 to the guard for arrival validation:

// Arrival process and CV only apply to rate-based clients. Concurrency clients
// use staggered fixed seed times and never construct an arrival sampler.
if c.Concurrency == 0 && c.CustomSamplerFactory == nil {
    if !validArrivalProcesses[c.Arrival.Process] {
        return fmt.Errorf("%s: unknown arrival process %q; valid: poisson, gamma, weibull, constant", prefix, c.Arrival.Process)
    }
    if c.Arrival.Process == "weibull" && c.Arrival.CV != nil {
        cv := *c.Arrival.CV
        if cv < 0.01 || cv > 10.4 {
            return fmt.Errorf("%s: weibull CV must be in [0.01, 10.4], got %f", prefix, cv)
        }
    }
}
if c.Concurrency == 0 && c.Arrival.CV != nil {
    if err := validateFinitePositive(prefix+".cv", *c.Arrival.CV); err != nil {
        return err
    }
}

This does not break any existing test — the existing concurrency test specs all set Arrival: ArrivalSpec{Process: "constant"}, which will continue to pass validation (the field is merely no longer required). SynthesizeFromDistribution can also drop the Arrival: ArrivalSpec{Process: "constant"} from its concurrency branch since it would no longer be needed for validation, though keeping it is harmless.

Add a test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted — spec with concurrency: 10 and no arrival field; verify Validate() returns nil.

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

Bug 4 (Significant): --num-requests 100 default silently caps concurrency runs that should be horizon-bounded

File: cmd/root.go, the concurrency > 0 branch

--num-requests has a default of 100. In the concurrency branch, numRequests (= 100 by default) is passed directly to SynthesizeFromDistribution:

} else if concurrency > 0 {
    spec = workload.SynthesizeFromDistribution(workload.DistributionParams{
        Concurrency: concurrency, ThinkTimeMs: thinkTimeMs,
        NumRequests: numRequests,  // ← always 100 unless user explicitly overrides
        ...
    })

A user running:

blis run --model qwen/qwen3-14b --concurrency 50 --horizon 60_000_000

expects to run 50 virtual users for 60 seconds. Instead they get exactly 100 total requests (50 seeds + 50 follow-ups) and the simulation stops, silently, far before the 60-second horizon. There is no warning.

Correct fix — in the concurrency branch, only use numRequests when the flag was explicitly provided; otherwise treat the request count as unbounded (0):

} else if concurrency > 0 {
    // In concurrency mode, --num-requests has no meaningful default.
    // If the user did not explicitly set it, leave it at 0 (unbounded) and
    // require --horizon to bound the run. The existing unbounded-generation
    // guard ("Workload requires either num_requests or --horizon") will fire
    // with a clear message if neither is provided.
    concurrencyNumRequests := 0
    if cmd.Flags().Changed("num-requests") {
        concurrencyNumRequests = numRequests
    }
    spec = workload.SynthesizeFromDistribution(workload.DistributionParams{
        Concurrency:        concurrency,
        ThinkTimeMs:        thinkTimeMs,
        NumRequests:        concurrencyNumRequests,
        PrefixTokens:       prefixTokens,
        PromptTokensMean:   promptTokensMean,
        PromptTokensStdDev: promptTokensStdev,
        PromptTokensMin:    promptTokensMin,
        PromptTokensMax:    promptTokensMax,
        OutputTokensMean:   outputTokensMean,
        OutputTokensStdDev: outputTokensStdev,
        OutputTokensMin:    outputTokensMin,
        OutputTokensMax:    outputTokensMax,
    })
    spec.Seed = seed

Behavior after fix:

Command Behavior
--concurrency 50 --num-requests 500 500 total requests (explicit cap) ✓
--concurrency 50 --horizon 60_000_000 Runs until 60 s horizon, unlimited requests ✓
--concurrency 50 (no --num-requests, no --horizon) Existing guard fires: "Workload requires either num_requests or --horizon to bound generation" — clear, correct error ✓
--concurrency 50 --num-requests 500 --horizon 60_000_000 Stops at whichever limit is hit first ✓

Note: observe_cmd.go already behaves correctly here because observeNumRequests defaults to 0 (not 100), so --concurrency + --horizon on blis observe already works as expected.

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

Bug 5 (Minor): --concurrency is silently ignored when --workload-spec is also provided

File: cmd/root.go (and cmd/observe_cmd.go)

The spec-file path takes priority over the concurrency > 0 branch:

if workloadSpec != "" {
    // Loads spec from file — --concurrency is never read
} else if concurrency > 0 {
    // This branch never runs when --workload-spec is set
}

A user running:

blis run --workload-spec myspec.yaml --concurrency 50

sees no error, no warning, and no output indicating that --concurrency 50 was dropped. The spec file is used as-is. If myspec.yaml has no concurrency clients, the user gets rate-based workload generation when they expected closed-loop behavior.

Correct fix — add a validation guard at the top of the spec-file branch in root.go:

if workloadSpec != "" {
    if concurrency > 0 {
        logrus.Fatalf("--concurrency cannot be used with --workload-spec; " +
            "define concurrency in the spec file using clients[].concurrency instead")
    }
    // ... existing spec-file loading unchanged

Apply the same guard in observe_cmd.go, immediately after the existing observeWorkloadSpec == "" path branches:

if observeWorkloadSpec != "" {
    if observeConcurrency > 0 {
        logrus.Fatalf("--concurrency cannot be used with --workload-spec; " +
            "define concurrency in the spec file using clients[].concurrency instead")
    }
    // ... existing spec-file loading unchanged

Making this a fatal error (rather than a warning) is intentional: silently ignoring a user-provided flag produces harder-to-debug behavior than a clear startup error.

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

Minor: Magic seed offset +10007 needs a comment explaining its relationship to +7919

File: sim/workload/generator.go

The PR introduces a second magic-number seed offset:

concurrencyRNG := rand.New(rand.NewSource(spec.Seed + 10007))

The existing closed-loop blueprint RNG (a few lines above) uses:

blueprintRNG := rand.New(rand.NewSource(spec.Seed + 7919))

Both offsets work correctly today because they differ. But there is no comment explaining why they must differ or what would break if a future contributor accidentally reused an offset. Given the pattern of adding new per-client RNG streams as the codebase grows, this is exactly the kind of implicit assumption that causes subtle, hard-to-reproduce non-determinism bugs.

Fix — add an explanatory comment:

// concurrencyRNG drives per-user seed selection and blueprint RNG seeding.
// Uses spec.Seed + 10007, distinct from blueprintRNG's spec.Seed + 7919 above,
// so the two streams do not produce identical sequences for the same spec seed.
// If new per-client RNG streams are added here, choose an offset not already
// in use in this function and document it with the same pattern.
concurrencyRNG := rand.New(rand.NewSource(spec.Seed + 10007))

Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where
totalRate==0 by skipping rate fraction renormalization and concatenating
clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients.

Bug 2 (Critical): Two coupled fixes:
- Part A: Seed generation loop now caps at maxRequests, preventing
  --concurrency 200 --num-requests 100 from producing 200 seeds.
- Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0).
  SessionManager gains budgetEnabled flag so SetFollowUpBudget(0)
  correctly means "no follow-ups" rather than "unlimited". Callers
  updated to use >= 0 check. Tests: SeedsCappedByMaxRequests,
  ZeroBudgetMeansNoFollowUps.

Bug 3 (Significant): Arrival process validation skipped for concurrency
clients (they use staggered fixed times, not arrival samplers). Users
can now omit the arrival field in YAML specs for concurrency clients.
Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted.

Bug 4 (Significant): In concurrency mode on blis run, --num-requests
default of 100 is now ignored unless the flag was explicitly set.
Without it, the run requires --horizon or explicit --num-requests.

Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal
error in both blis run and blis observe, instead of silently ignoring
the concurrency flag.

Minor: Added explanatory comment for the +10007 RNG offset explaining
its relationship to +7919 and the need for distinct offsets.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Re Bug 1 (compose): Fixed in 2196a6f. ComposeSpecs now handles totalRate==0 by skipping rate fraction renormalization and concatenating clients directly. The mixed case (one concurrency spec + one rate spec) also works correctly since totalRate > 0 from the rate spec, and RateFraction *= weight is a no-op for concurrency clients. Test added: TestComposeSpecs_AllConcurrency_MergesClients.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Re Bug 2 (seed cap + budget sentinel): Fixed in 2196a6f. Two coupled changes:

  • Part A: Seed generation loop now checks maxRequests before each seed. --concurrency 200 --num-requests 100 correctly produces 100 seeds. Test: TestGenerateWorkload_ConcurrencyClient_SeedsCappedByMaxRequests.

  • Part B: FollowUpBudget now uses -1 as "no cap" sentinel. SessionManager gains a budgetEnabled flag so SetFollowUpBudget(0) correctly means "no follow-ups" (not "unlimited"). Callers use >= 0 check. --concurrency 50 --num-requests 50 now correctly produces 50 seeds with zero follow-ups. Test: TestGenerateWorkload_ConcurrencyClient_ZeroBudgetMeansNoFollowUps.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Re Bug 3 (arrival validation): Fixed in 2196a6f. validateClient now gates arrival process and CV validation behind c.Concurrency == 0, so concurrency clients no longer need a meaningless arrival: {process: constant} field. Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Re Bug 4 (default --num-requests caps concurrency): Fixed in 2196a6f. In the concurrency branch of root.go, numRequests is now only used when cmd.Flags().Changed("num-requests") is true. Otherwise it defaults to 0 (unbounded), requiring --horizon to bound the run. The existing unbounded-generation guard fires with a clear message if neither is provided. observe_cmd.go was already correct (observeNumRequests defaults to 0).

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Re Bug 5 (--concurrency silently ignored with --workload-spec): Fixed in 2196a6f. Both root.go and observe_cmd.go now logrus.Fatalf when --concurrency > 0 and a workload spec file is also provided, with a message directing users to clients[].concurrency in the spec file instead.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

Re seed offset comment: Fixed in 2196a6f. Added explanatory comment documenting why +10007 must differ from +7919, what would break if reused, and the pattern to follow when adding new per-client RNG streams.

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @sriumcp's task in 4m 8s —— View job


PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

  • Post initial comment
  • Read changed files and current branch state (commits through 2196a6f)
  • Review correctness & parent issue alignment
  • Review behavioral contracts & invariants (all 5 bugs + minor fixed in 2196a6f)
  • Review test quality (behavioral/non-structural)
  • Review separation of concerns & modularity
  • Review API boundaries & contracts
  • Review standards adherence (R1-R23, principles)
  • Review documentation correctness & completeness
  • Review performance implications
  • Synthesize findings

Fourth-Pass Review Summary

All five bugs and the seed-offset comment from Round 3 are correctly addressed in 2196a6f. One minor documentation gap remains; everything else is correct and clean.


All Prior Issues Resolved

Issue Commit Status
INV-11 missing budget_exhausted terminal state 909b879
Misleading RNG comment in generator.go 909b879
Mixed concurrency + multi-turn spec not rejected 909b879
Cohort multi-turn bypass of mixed-workload guard beb3132
Bug 1: ComposeSpecs failing for all-concurrency specs 2196a6f
Bug 2a: Seeds generated without maxRequests cap 2196a6f
Bug 2b: FollowUpBudget=0 ambiguity / budgetEnabled sentinel 2196a6f
Bug 3: Arrival validation required for concurrency clients 2196a6f
Bug 4: --num-requests default silently capping horizon-bounded runs 2196a6f
Bug 5: --concurrency silently ignored with --workload-spec 2196a6f
Magic seed offset comment 2196a6f

Minor Issue: observe_cmd.go help text still says "Supports both"

observe_cmd.go:65 (the Long description for blis observe) reads:

Supports both --workload-spec (YAML) and --rate (distribution synthesis) input paths.

After this PR, there are three input paths, not two. A user running blis observe --help gets stale help text with no mention of --concurrency in the description or examples. The examples section also only shows --workload-spec and --rate usage.

Suggested fix:

Supports --workload-spec (YAML), --rate (distribution synthesis), or --concurrency
(closed-loop virtual users) input paths.

And add one --concurrency example matching the PR description.

Fix this →


Correctness Verification (Final Pass)

Budget sentinel mechanics (generator.go:641-649, session.go:86-89, root.go:1051-1053):

The three-value sentinel (-1/0/+N) is implemented correctly end-to-end:

Scenario FollowUpBudget SetFollowUpBudget called? Behavior
--concurrency 50 --num-requests 500 450 Yes (450) 500 total ✓
--concurrency 50 --num-requests 50 0 Yes (0) → budgetEnabled=true, immediate exhaustion 50 total ✓
--concurrency 200 --num-requests 100 0 Yes (0) 100 seeds (capped), 0 follow-ups ✓
--concurrency 50 --horizon 60s -1 No horizon-bounded ✓
No sessions (rate-only) 0 (zero value) No — len(Sessions)==0 guards N/A ✓

Seed cap (generator.go:557-561): The inner-loop break fires before concurrencyRNG.Int63() draws, which is correct — uncapped users don't consume RNG entropy. The totalConcurrencyUsers += client.Concurrency at line 624 uses the full concurrency count, but this is a truthy check only (the budget formula uses len(allReqs) for the actual count). ✓

budgetEnabled guard (session.go:141, 203-205): SetFollowUpBudget(0) sets budgetEnabled=true, so the pre-generation check sm.budgetEnabled && sm.followUpCount >= sm.followUpBudget fires immediately (0 >= 0). The post-generation increment is likewise gated on sm.budgetEnabled. Off-by-one confirmed correct: check-before, increment-after. ✓

All BC contracts: Verified in Rounds 1-3, no regression in 2196a6f changes.


Test Quality (Final Pass)

New tests added in 2196a6f:

  • TestGenerateWorkload_ConcurrencyClient_SeedsCappedByMaxRequests — 200 users, cap=100, verifies 100 seeds and FollowUpBudget=0
  • TestGenerateWorkload_ConcurrencyClient_ZeroBudgetMeansNoFollowUpsSetFollowUpBudget(0) produces nil follow-up immediately ✓
  • TestValidate_ConcurrencyClient_NoArrivalField_Accepted — concurrency client with empty arrival passes validation ✓
  • TestComposeSpecs_AllConcurrency_MergesClients — two concurrency-only specs compose and pass Validate()
  • TestValidate_ConcurrencyAndCohortMultiTurn_Rejects — cohort with multi-turn + concurrency client rejected ✓ (Round 3 fix)

All behavioral, non-structural, refactor-survivable. ✓

One test semantic note (informational, not blocking): TestGenerateWorkload_NonConcurrencySpec_Unchanged asserts wl.FollowUpBudget != 0 with the comment "want 0 (no sessions)". The early-return path (generator.go:404) returns FollowUpBudget=0 (Go zero value), which matches 0. Per the documented sentinel convention, 0 means "no follow-ups allowed" but for no-session workloads the field is semantically N/A. The caller guards with len(wl.Sessions) > 0 so this is harmless in practice — the zero value is never consumed for non-session workloads. The test assertion is correct but tests the zero value rather than the intended semantic. Not worth changing.


Separation of Concerns & Modularity

Clean. All five bug fixes in 2196a6f are correctly scoped:

  • spec.go: arrival guard added to validateClient — no behavioral change to rate-based clients ✓
  • generator.go: seed cap check added inside inner loop — existing logic untouched ✓
  • session.go: budgetEnabled field isolates "unset" from "set to 0" — no interface change ✓
  • root.go/observe_cmd.go: --concurrency × --workload-spec fatal, --num-requests defaulting fixed — correctly scoped ✓
  • convert.go: totalRate==0 path concatenates without scaling — existing scaling path untouched ✓

Standards Adherence

  • R4: SessionBlueprint is still constructed in two sites (closed-loop, concurrency). The structural difference (MaxRounds vs UnlimitedRounds) justifies two sites; this was flagged and accepted in Round 2.
  • R1: No silent continue. All error paths return errors. ✓
  • R3: CLI validation for --concurrency < 0, --think-time-ms < 0, mutual exclusion with --rate and --workload-spec. ✓
  • R8-R10: No new exported mutable maps, value types used throughout, strict YAML parsing preserved. ✓
  • R2: Map iteration sorted before RNG draws (sort.Strings(sortedSessionIDs), sort.SliceStable). ✓

Performance Implications

No DES hot-path changes. All new code executes at workload-generation time (startup). ✓


Parent Issue Alignment

The PR delivers exactly what #859 requires: N virtual users, one in-flight request each, optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe. All behavioral contracts are verified with tests. ✓


Verdict

Approve after the one-line help text fix (the "Supports both" → "Supports three modes" update in observe_cmd.go). All prior substantive issues are fully resolved. The implementation is correct, well-tested, properly isolated, and all invariants are maintained and documented.

@sriumcp
Copy link
Copy Markdown
Collaborator

sriumcp commented Mar 27, 2026

In addition to the one line help text fix above, please also fix the following.



Comment 1 — Non-blocking: FollowUpBudget struct comment is wrong

Non-blocking: FollowUpBudget struct comment is wrong for the no-session path

In sim/workload/generator.go, the GeneratedWorkload struct comment says:

// -1 = no cap (either no concurrency clients, or maxRequests was 0/unlimited).

But when there are no sessions and no concurrency clients, GenerateWorkload returns early:

return &GeneratedWorkload{Requests: reqs}, nil

This sets FollowUpBudget to 0 (Go zero value), not -1. The test TestGenerateWorkload_NonConcurrencySpec_Unchanged confirms this by
asserting FollowUpBudget == 0.

Fix — comment only, no logic change needed:

// FollowUpBudget is the cap on follow-up requests for concurrency sessions.
// -1 = no cap (maxRequests was 0/unlimited, or only closed-loop multi-turn clients present).
// 0 = no follow-ups allowed (seeds consumed the entire budget, or no sessions at all).
// >0 = exactly that many follow-ups allowed.


Comment 2 — Blocker: ComposeSpecs no longer guards totalRate < 0

Blocker: ComposeSpecs dropped the guard for totalRate < 0

The original convert.go had:

if totalRate <= 0 {
return nil, fmt.Errorf("compose: total aggregate rate must be positive, got %f", totalRate)
}

The new code only handles totalRate == 0 (all-concurrency) and then falls through to renormalization for everything else. If totalRate
is negative — which happens if any spec in the list has AggregateRate < 0 — the weight s.AggregateRate / totalRate becomes negative,
silently producing clients with negative RateFraction. Those clients then pass validateClient (which only checks RateFraction > 0 for
rate-based clients, not < 0) and corrupt the merged spec.

Fix — add one guard for the negative case:

if totalRate == 0 {
// all-concurrency path (existing code)
...
return merged, nil
}

if totalRate < 0 {
return nil, fmt.Errorf("compose: total aggregate rate must be non-negative, got %f", totalRate)
}

// Rate-based or mixed: renormalize (existing code)
for _, s := range specs {
...
}

Add a corresponding test:

func TestComposeSpecs_NegativeTotalRate_ReturnsError(t *testing.T) {
spec := &WorkloadSpec{
Version: "2", Category: "language",
AggregateRate: -5.0,
Clients: []ClientSpec{{ID: "c1", RateFraction: 1.0, ...}},
}
_, err := ComposeSpecs([]*WorkloadSpec{spec})
if err == nil {
t.Error("expected error for negative aggregate rate")
}
}

Note: a spec with AggregateRate < 0 would be rejected by Validate() in normal usage, but ComposeSpecs accepts raw specs and should not
silently corrupt them.

- Update observe_cmd.go Long description to mention --concurrency as a
  third input path and add a concurrency example
- Fix FollowUpBudget struct comment: 0 also covers the no-sessions
  early-return path, not just budget-consumed
- Restore negative totalRate guard in ComposeSpecs that was dropped
  when adding the totalRate==0 all-concurrency path
  Test: TestComposeSpecs_NegativeTotalRate_ReturnsError

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

All three items from the latest review addressed in 74765be:

  1. Help text fix: observe_cmd.go Long description now reads "Supports --workload-spec (YAML), --rate (distribution synthesis), or --concurrency (closed-loop virtual users) input paths." Added a --concurrency example to the examples section.

  2. FollowUpBudget comment fix: Updated the struct doc to clarify that 0 also covers the no-sessions early-return path (Go zero value), not just the budget-consumed case.

  3. Negative rate guard restored: Added totalRate < 0 check back into ComposeSpecs between the == 0 (all-concurrency) and the renormalization paths. A spec with negative AggregateRate would be caught by Validate() in normal usage, but ComposeSpecs accepts raw specs and should not silently corrupt them. Test: TestComposeSpecs_NegativeTotalRate_ReturnsError.

Full suite passes.

@namasl
Copy link
Copy Markdown
Contributor Author

namasl commented Mar 27, 2026

@claude /pr-review-toolkit:review-pr does this pr tackle the parent issue correctly? Validate correctness, ease of use, separation of concerns, adherence to our @docs/contributing/standards, invariants, behavioral discipline, behavioral tests and non-structural tests, modularity, clean api boundaries and contracts, performance implications, documentation correctness and completeness.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @namasl's task in 4m 38s —— View job


PR Review: feat(workload): add --concurrency flag for closed-loop single-turn workload generation

Tasks

  • Post initial comment
  • Read changed files and current branch state (commits through 74765be)
  • Review correctness & parent issue alignment
  • Review behavioral contracts & invariants
  • Review test quality (behavioral/non-structural)
  • Review separation of concerns & modularity
  • Review API boundaries & contracts
  • Review standards adherence (R1-R23, principles)
  • Review documentation correctness & completeness
  • Review performance implications
  • Synthesize findings

Fifth-Pass Review Summary

All issues from rounds 1–4 (sriumcp's 5 bugs + minor, plus the observe_cmd help text, FollowUpBudget comment, and negative rate guard) are fully resolved in commits 909b879, beb3132, 2196a6f, and 74765be. No new issues found. This PR is ready to merge.


All Prior Issues Resolved

Issue Commit Status
INV-11 missing budget_exhausted terminal state 909b879
Misleading RNG re-derivation comment 909b879
Mixed concurrency + multi-turn spec not rejected 909b879
Cohort multi-turn bypass of mixed-workload guard beb3132
Bug 1: ComposeSpecs failing for all-concurrency specs 2196a6f
Bug 2a: Seeds generated without maxRequests cap 2196a6f
Bug 2b: FollowUpBudget=0 ambiguity / budgetEnabled sentinel 2196a6f
Bug 3: Arrival validation required for concurrency clients 2196a6f
Bug 4: --num-requests default silently capping horizon-bounded runs 2196a6f
Bug 5: --concurrency silently ignored with --workload-spec 2196a6f
Magic seed offset comment 2196a6f
observe_cmd.go help text "Supports both" → three modes 74765be
FollowUpBudget struct comment covering no-sessions zero-value 74765be
ComposeSpecs dropped totalRate < 0 guard 74765be

Correctness Verification (Fifth Pass)

Stagger formula (generator.go:569-571): client.Concurrency > 1 guard is correct — for N=1, there's nothing to stagger, and the formula 0 * ThinkTimeUs / 1 = 0 would give t=0 anyway. Integer division satisfies BC-3's requirement of arrivals ∈ [0, ThinkTimeUs). ✓

Budget sentinel end-to-end (generator.go:642-649, session.go:86-88, root.go/observe_cmd.go):

Command FollowUpBudget SetFollowUpBudget called? Outcome
--concurrency 50 --num-requests 500 450 Yes (450) 500 total ✓
--concurrency 50 --num-requests 50 0 Yes (0), budgetEnabled=true → immediate exhaustion 50 total ✓
--concurrency 200 --num-requests 100 0 Yes (0) 100 seeds (capped), 0 follow-ups ✓
--concurrency 50 --horizon 60s -1 No horizon-bounded ✓
Rate-only (no sessions) 0 (zero value) No — len(Sessions)==0 guards N/A ✓

budgetEnabled guard (session.go:141, 203-205): SetFollowUpBudget(0) sets budgetEnabled=true, so the check sm.budgetEnabled && sm.followUpCount >= sm.followUpBudget fires immediately (0 ≥ 0). Off-by-one is correct: check-before, increment-after. ✓

ComposeSpecs concurrency handling (convert.go:136-160):

  • totalRate == 0: all-concurrency path concatenates clients, returns early — correct
  • totalRate < 0: guard added in 74765be — correct
  • Mixed (one concurrency + one rate spec): concurrency clients have RateFraction=0, so 0 *= weight is a no-op — correct ✓

Cohort multi-turn guard (spec.go:281-285): loops s.Cohorts and checks Reasoning.MultiTurn != nil. Cohorts have no Concurrency field (rate-based only), so the hasConcurrency check correctly stays in s.Clients. The asymmetry is correct, not a bug. ✓


BC Contract Verification (Complete)

Contract Status Test(s)
BC-1: --concurrency and --rate mutually exclusive TestValidate_ConcurrencyAndRateFraction_Rejects, CLI guard
BC-2: Exactly N seeds per client TestGenerateWorkload_ConcurrencyClient_ProducesSeedsAndBlueprints
BC-3: Seeds staggered in [0, think_time) TestGenerateWorkload_ConcurrencyClient_StaggeredArrivals, ZeroThinkTime case
BC-4: Follow-up arrival = completion + think_time End-to-end test asserts ArrivalTime == 15000
BC-5: Session ID preserved through rounds End-to-end test
BC-6: Fresh tokens, no context accumulation TestSession_NoContextAccumulation_FreshTokens
BC-7: --num-requests caps total Budget exhaustion test; TestGenerateWorkload_ConcurrencyClient_SeedsCappedByMaxRequests
BC-8: Deterministic (INV-6) TestGenerateWorkload_ConcurrencyClient_Deterministic
BC-9: Every user reaches terminal state (INV-11) All four terminal states documented and exercised
BC-10: Timeout/drop cancels session TestConcurrencyMode_TimeoutCancelsSession
BC-11: Non-concurrency backward compat TestGenerateWorkload_NonConcurrencySpec_Unchanged

INV Compliance (Final)

  • INV-6 (determinism): Partitioned RNG with documented offsets (+7919 closed-loop, +10007 concurrency). TestGenerateWorkload_ConcurrencyClient_Deterministic passes. ✓
  • INV-10 (session causality): arrivalTime = tick + bp.ThinkTimeUs is exact. ✓
  • INV-11 (session completeness): All four terminal states (completed, cancelled, horizon-interrupted, budget-exhausted) documented in invariants.md, all exercised by tests. ✓
  • INV-1 (request conservation): Budget formula maxRequests - len(allReqs) with seed cap ensures seeds + follow-ups ≤ maxRequests. ✓

Test Quality (Final)

All tests assert observable behavior — request counts, arrival times, session IDs, token lengths, budget exhaustion state. No assertions on concrete type names or internal fields. Refactor-survivable. Coverage includes:

  • 6 spec validation negatives (concurrency acceptance, mutual exclusion, negatives, aggregate_rate, mixed workload, cohort multi-turn)
  • 5 session manager tests (unlimited rounds, budget exhaustion step-by-step, SetFollowUpBudget(0), zero MaxRounds, fresh tokens)
  • 2 synthesis tests (concurrency mode, prefix propagation)
  • 9 generator tests (seed count, stagger, determinism, metadata, blueprint properties, backward compat, seeds capped, zero budget)
  • 2 end-to-end integration tests (full follow-up flow, timeout cancellation)
  • 2 compose tests (all-concurrency merge, negative rate error)

Standards Adherence

  • R1–R3: All error paths return errors; all inputs validated at CLI and spec layers. ✓
  • R4: SessionBlueprint constructed in two sites (closed-loop/concurrency) — structurally distinct blueprints justify two sites. ✓
  • R8–R10: No new exported mutable maps; value types used; strict YAML parsing unchanged. ✓
  • Output channel separation: All diagnostics via logrus (stderr). ✓

Performance Implications

No DES hot-path changes. All new code executes at workload-generation time (startup). generatePrefixTokens called once for concurrency clients (re-derivation from same seed — correct). ✓


Parent Issue Alignment

PR fully delivers #859: N virtual users, exactly one in-flight request each, optional think time, capped by --num-requests, mutually exclusive with --rate, available on both blis run and blis observe, backward-compatible. ✓


Verdict

Approve. All issues from all prior review rounds are fully addressed. The implementation is correct, test coverage is thorough and behavioral, all invariants are maintained and documented, and the feature precisely delivers what #859 describes.

@sriumcp sriumcp merged commit af1a787 into inference-sim:main Mar 27, 2026
5 checks passed
@namasl namasl deleted the feat/concurrency-flag branch March 27, 2026 22:58
vishakha-ramani pushed a commit to vishakha-ramani/inference-sim that referenced this pull request Mar 30, 2026
…rkload generation (inference-sim#861)

* feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec

Add spec-level support for closed-loop concurrency workloads. When
Concurrency > 0, the client operates in concurrency mode (mutually
exclusive with rate_fraction). ThinkTimeUs controls inter-request
delay in concurrency mode. Aggregate_rate is only required when at
least one client is rate-based (Concurrency == 0).

Part of inference-sim#859.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): add UnlimitedRounds and follow-up budget to SessionManager

Add UnlimitedRounds field to SessionBlueprint so concurrency sessions
can run past MaxRounds until budget/horizon/timeout/drop. Add global
follow-up budget tracking to SessionManager with SetFollowUpBudget()
and sessionBudgetExhausted terminal state.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): add concurrency mode to distribution synthesis

Extend DistributionParams with Concurrency and ThinkTimeMs fields and
update SynthesizeFromDistribution to produce concurrency-mode specs
(Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero
AggregateRate/RateFraction) while preserving backward-compatible rate
mode for existing callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): generate seed requests and session blueprints for concurrency clients

Add concurrency client handling to GenerateWorkload: for each client with
Concurrency > 0, generate N seed requests (one per virtual user) with
staggered arrival times and create N unlimited-round SessionBlueprints.
Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(cli): add --concurrency and --think-time-ms flags to blis run

Wire up closed-loop concurrency mode to the CLI: register new flags,
validate mutual exclusion with --rate (BC-1), add concurrency synthesis
path in the workload if/else chain, and pass FollowUpBudget from
GeneratedWorkload to SessionManager.

Refs: inference-sim#859

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(cli): add --concurrency and --think-time-ms flags to blis observe

Wire up closed-loop concurrency mode for the observe command, matching
the flags already added to blis run. Includes mutual exclusion with
--rate, auto-setting of --max-concurrency, and follow-up budget wiring.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* test(workload): add end-to-end concurrency mode integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment

- Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted
  as a fourth terminal state for concurrency-mode sessions
- Add spec validation rejecting mixed concurrency + multi-turn clients
  (follow-up budget accounting doesn't support this combination)
- Fix misleading comment in generator.go about prefix token RNG reuse
  (it re-derives from the same seed, not continuing the original stream)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): extend mixed-workload guard to check cohorts for multi-turn

The concurrency + multi-turn validation only iterated s.Clients, missing
CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with
multi-turn combined with concurrency clients would bypass the guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): address 5 bugs + 1 minor from PR review round 2

Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where
totalRate==0 by skipping rate fraction renormalization and concatenating
clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients.

Bug 2 (Critical): Two coupled fixes:
- Part A: Seed generation loop now caps at maxRequests, preventing
  --concurrency 200 --num-requests 100 from producing 200 seeds.
- Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0).
  SessionManager gains budgetEnabled flag so SetFollowUpBudget(0)
  correctly means "no follow-ups" rather than "unlimited". Callers
  updated to use >= 0 check. Tests: SeedsCappedByMaxRequests,
  ZeroBudgetMeansNoFollowUps.

Bug 3 (Significant): Arrival process validation skipped for concurrency
clients (they use staggered fixed times, not arrival samplers). Users
can now omit the arrival field in YAML specs for concurrency clients.
Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted.

Bug 4 (Significant): In concurrency mode on blis run, --num-requests
default of 100 is now ignored unless the flag was explicitly set.
Without it, the run requires --horizon or explicit --num-requests.

Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal
error in both blis run and blis observe, instead of silently ignoring
the concurrency flag.

Minor: Added explanatory comment for the +10007 RNG offset explaining
its relationship to +7919 and the need for distinct offsets.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): help text, FollowUpBudget comment, negative rate guard

- Update observe_cmd.go Long description to mention --concurrency as a
  third input path and add a concurrency example
- Fix FollowUpBudget struct comment: 0 also covers the no-sessions
  early-return path, not just budget-consumed
- Restore negative totalRate guard in ComposeSpecs that was dropped
  when adding the totalRate==0 all-concurrency path
  Test: TestComposeSpecs_NegativeTotalRate_ReturnsError

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
susiejojo pushed a commit that referenced this pull request Mar 30, 2026
…rkload generation (#861)

* feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec

Add spec-level support for closed-loop concurrency workloads. When
Concurrency > 0, the client operates in concurrency mode (mutually
exclusive with rate_fraction). ThinkTimeUs controls inter-request
delay in concurrency mode. Aggregate_rate is only required when at
least one client is rate-based (Concurrency == 0).

Part of #859.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): add UnlimitedRounds and follow-up budget to SessionManager

Add UnlimitedRounds field to SessionBlueprint so concurrency sessions
can run past MaxRounds until budget/horizon/timeout/drop. Add global
follow-up budget tracking to SessionManager with SetFollowUpBudget()
and sessionBudgetExhausted terminal state.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): add concurrency mode to distribution synthesis

Extend DistributionParams with Concurrency and ThinkTimeMs fields and
update SynthesizeFromDistribution to produce concurrency-mode specs
(Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero
AggregateRate/RateFraction) while preserving backward-compatible rate
mode for existing callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): generate seed requests and session blueprints for concurrency clients

Add concurrency client handling to GenerateWorkload: for each client with
Concurrency > 0, generate N seed requests (one per virtual user) with
staggered arrival times and create N unlimited-round SessionBlueprints.
Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(cli): add --concurrency and --think-time-ms flags to blis run

Wire up closed-loop concurrency mode to the CLI: register new flags,
validate mutual exclusion with --rate (BC-1), add concurrency synthesis
path in the workload if/else chain, and pass FollowUpBudget from
GeneratedWorkload to SessionManager.

Refs: #859

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(cli): add --concurrency and --think-time-ms flags to blis observe

Wire up closed-loop concurrency mode for the observe command, matching
the flags already added to blis run. Includes mutual exclusion with
--rate, auto-setting of --max-concurrency, and follow-up budget wiring.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* test(workload): add end-to-end concurrency mode integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment

- Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted
  as a fourth terminal state for concurrency-mode sessions
- Add spec validation rejecting mixed concurrency + multi-turn clients
  (follow-up budget accounting doesn't support this combination)
- Fix misleading comment in generator.go about prefix token RNG reuse
  (it re-derives from the same seed, not continuing the original stream)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): extend mixed-workload guard to check cohorts for multi-turn

The concurrency + multi-turn validation only iterated s.Clients, missing
CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with
multi-turn combined with concurrency clients would bypass the guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): address 5 bugs + 1 minor from PR review round 2

Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where
totalRate==0 by skipping rate fraction renormalization and concatenating
clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients.

Bug 2 (Critical): Two coupled fixes:
- Part A: Seed generation loop now caps at maxRequests, preventing
  --concurrency 200 --num-requests 100 from producing 200 seeds.
- Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0).
  SessionManager gains budgetEnabled flag so SetFollowUpBudget(0)
  correctly means "no follow-ups" rather than "unlimited". Callers
  updated to use >= 0 check. Tests: SeedsCappedByMaxRequests,
  ZeroBudgetMeansNoFollowUps.

Bug 3 (Significant): Arrival process validation skipped for concurrency
clients (they use staggered fixed times, not arrival samplers). Users
can now omit the arrival field in YAML specs for concurrency clients.
Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted.

Bug 4 (Significant): In concurrency mode on blis run, --num-requests
default of 100 is now ignored unless the flag was explicitly set.
Without it, the run requires --horizon or explicit --num-requests.

Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal
error in both blis run and blis observe, instead of silently ignoring
the concurrency flag.

Minor: Added explanatory comment for the +10007 RNG offset explaining
its relationship to +7919 and the need for distinct offsets.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): help text, FollowUpBudget comment, negative rate guard

- Update observe_cmd.go Long description to mention --concurrency as a
  third input path and add a concurrency example
- Fix FollowUpBudget struct comment: 0 also covers the no-sessions
  early-return path, not just budget-consumed
- Restore negative totalRate guard in ComposeSpecs that was dropped
  when adding the totalRate==0 all-concurrency path
  Test: TestComposeSpecs_NegativeTotalRate_ReturnsError

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
vishakha-ramani pushed a commit to vishakha-ramani/inference-sim that referenced this pull request Mar 30, 2026
…rkload generation (inference-sim#861)

* feat(workload): add Concurrency and ThinkTimeUs fields to ClientSpec

Add spec-level support for closed-loop concurrency workloads. When
Concurrency > 0, the client operates in concurrency mode (mutually
exclusive with rate_fraction). ThinkTimeUs controls inter-request
delay in concurrency mode. Aggregate_rate is only required when at
least one client is rate-based (Concurrency == 0).

Part of inference-sim#859.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): add UnlimitedRounds and follow-up budget to SessionManager

Add UnlimitedRounds field to SessionBlueprint so concurrency sessions
can run past MaxRounds until budget/horizon/timeout/drop. Add global
follow-up budget tracking to SessionManager with SetFollowUpBudget()
and sessionBudgetExhausted terminal state.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): add concurrency mode to distribution synthesis

Extend DistributionParams with Concurrency and ThinkTimeMs fields and
update SynthesizeFromDistribution to produce concurrency-mode specs
(Concurrency > 0 sets client.Concurrency and ThinkTimeUs with zero
AggregateRate/RateFraction) while preserving backward-compatible rate
mode for existing callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(workload): generate seed requests and session blueprints for concurrency clients

Add concurrency client handling to GenerateWorkload: for each client with
Concurrency > 0, generate N seed requests (one per virtual user) with
staggered arrival times and create N unlimited-round SessionBlueprints.
Compute FollowUpBudget = maxRequests - numSeeds for the SessionManager.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(cli): add --concurrency and --think-time-ms flags to blis run

Wire up closed-loop concurrency mode to the CLI: register new flags,
validate mutual exclusion with --rate (BC-1), add concurrency synthesis
path in the workload if/else chain, and pass FollowUpBudget from
GeneratedWorkload to SessionManager.

Refs: inference-sim#859

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(cli): add --concurrency and --think-time-ms flags to blis observe

Wire up closed-loop concurrency mode for the observe command, matching
the flags already added to blis run. Includes mutual exclusion with
--rate, auto-setting of --max-concurrency, and follow-up budget wiring.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* test(workload): add end-to-end concurrency mode integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): address PR review — INV-11 docs, mixed workload guard, RNG comment

- Update INV-11 in invariants.md and CLAUDE.md to document budget_exhausted
  as a fourth terminal state for concurrency-mode sessions
- Add spec validation rejecting mixed concurrency + multi-turn clients
  (follow-up budget accounting doesn't support this combination)
- Fix misleading comment in generator.go about prefix token RNG reuse
  (it re-derives from the same seed, not continuing the original stream)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): extend mixed-workload guard to check cohorts for multi-turn

The concurrency + multi-turn validation only iterated s.Clients, missing
CohortSpec entries that also carry Reasoning.MultiTurn. A cohort with
multi-turn combined with concurrency clients would bypass the guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): address 5 bugs + 1 minor from PR review round 2

Bug 1 (Critical): ComposeSpecs now handles all-concurrency specs where
totalRate==0 by skipping rate fraction renormalization and concatenating
clients directly. Test: TestComposeSpecs_AllConcurrency_MergesClients.

Bug 2 (Critical): Two coupled fixes:
- Part A: Seed generation loop now caps at maxRequests, preventing
  --concurrency 200 --num-requests 100 from producing 200 seeds.
- Part B: FollowUpBudget uses -1 as "no cap" sentinel (was 0).
  SessionManager gains budgetEnabled flag so SetFollowUpBudget(0)
  correctly means "no follow-ups" rather than "unlimited". Callers
  updated to use >= 0 check. Tests: SeedsCappedByMaxRequests,
  ZeroBudgetMeansNoFollowUps.

Bug 3 (Significant): Arrival process validation skipped for concurrency
clients (they use staggered fixed times, not arrival samplers). Users
can now omit the arrival field in YAML specs for concurrency clients.
Test: TestValidate_ConcurrencyClient_NoArrivalField_Accepted.

Bug 4 (Significant): In concurrency mode on blis run, --num-requests
default of 100 is now ignored unless the flag was explicitly set.
Without it, the run requires --horizon or explicit --num-requests.

Bug 5 (Minor): --concurrency + --workload-spec now produces a fatal
error in both blis run and blis observe, instead of silently ignoring
the concurrency flag.

Minor: Added explanatory comment for the +10007 RNG offset explaining
its relationship to +7919 and the need for distinct offsets.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(workload): help text, FollowUpBudget comment, negative rate guard

- Update observe_cmd.go Long description to mention --concurrency as a
  third input path and add a concurrency example
- Fix FollowUpBudget struct comment: 0 also covers the no-sessions
  early-return path, not just budget-consumed
- Restore negative totalRate guard in ComposeSpecs that was dropped
  when adding the totalRate==0 all-concurrency path
  Test: TestComposeSpecs_NegativeTotalRate_ReturnsError

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(workload): add --concurrency flag for closed-loop single-turn workload generation

2 participants