docs: fix hardcoded metrics mismatches and tutorial narrative gaps#632
docs: fix hardcoded metrics mismatches and tutorial narrative gaps#632
Conversation
5694270 to
c32573b
Compare
4a46fe8 to
e269ba7
Compare
|
@claude review PR |
|
Claude finished @sriumcp's task in 1m 37s —— View job PR Review: docs: fix hardcoded metrics mismatches and tutorial narrative gaps
Overall AssessmentGood PR — the core mission (updating stale metrics after the blackbox→roofline default switch) is well-executed. The measured metrics table in the PR description is helpful, and the tutorial rewrite with the 200 req/s scenario tells a much more compelling story than the old 20 req/s version. The added admonitions (HF fetch note, multi-JSON output, Issues Found1. cluster.md: All example excess rates are positive — no sub-saturation example (Minor)
None of these examples actually show the "sub-saturation" regime the next line references. The old version had 8 instances at excess=5.1 which was genuinely near equilibrium. Consider either: (a) adding a 4th line showing ~30 instances where excess goes negative, or (b) using a lower rate (e.g., rate=200) where 12 instances crosses the threshold (200/12 - 17 ≈ -0.3, essentially balanced). The "~8x" improvement claim also isn't self-evident from the numbers since both 4 and 8 instances are deeply overloaded — the reader can't verify it from the excess rates alone. 2. tutorial.md: Step 3 comment says "moderate headroom" for 8 instances, but body says it's overloaded (Minor)
3. tutorial.md: Capacity math in Step 3 comments could confuse readers (Nit)
4. quickstart.md:
|
) - Tutorial: rewrite capacity planning scenario with rate=200 (was rate=20), correct per-instance throughput ~17 req/s (was ~2.5), add --rate clarification, explain fitness scores, honest routing differentiation claims - workloads.md: fix stale ~57 req/s capacity → ~17 req/s for roofline mode - cluster.md: fix queue growth math using 17 req/s (was 57.4 req/s) - kv-cache.md: fix 2048-token prefill time ~97ms (was ~43ms) - routing.md: fix stale ~9ms step time claim - quickstart.md: add HF fetch note, multi-JSON output format warning - README.md: add HF fetch note, convert command descriptions, meaningful cluster example load, split features into Core/Advanced, collapse project structure behind <details> tag Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Addressed all review comments:
|
|
@claude Are all the review comments addressed? See comment above: #632 (comment) |
|
Claude finished @sriumcp's task in 45s —— View job Review comment verificationAll 5 review comments have been addressed. Here's the verification:
LGTM — all review feedback has been addressed and the changes look correct. |
Summary
Fixes documentation metrics that drifted after the default latency model switched from blackbox (LLaMA 3.1 8B) to roofline (Qwen3-14B). All numbers in this PR were measured by running the exact commands shown in the docs.
--rateclarification<details>Measured metrics (Qwen3-14B / H100 / TP=1, roofline mode)
Test plan
go build ./...passesgo test ./...passes (all packages OK)Fixes #550
🤖 Generated with Claude Code