Refactor result storage from CSV to JSON in Redis#237
Conversation
Enable stateless_http=True so the MCP server holds no in-memory sessions, allowing horizontal scaling without session affinity. Replace clientInfo-based widget detection with a User-Agent ContextVar fallback (claude-code blocklist). Add GCPBackendPolicy + ClientIP session affinity as defense-in-depth. Co-Authored-By: Claude Opus 4.6 <[email protected]>
CSV round-trip (df.to_csv → Redis → pd.read_csv) destroyed nested dict
columns like `research`, requiring fragile workarounds (_restore_dict_columns
with ast.literal_eval). Switching to JSON storage eliminates the problem
entirely and makes the JSON download path faster (no CSV→DataFrame→JSON
conversion needed).
- Rename store_result_csv/get_result_csv/result_csv_exists → JSON equivalents
- Redis key pattern: result:{id}:csv → result:{id}:json
- Store via df.to_dict(orient="records") → json.dumps() directly
- JSON download returns raw string from Redis (zero parsing)
- CSV download generated on-the-fly from stored JSON
- Remove _restore_dict_columns() and research._all widget fallback
- Bump cache size limit from 50MB to 80MB (JSON is ~1.5x larger)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
@claude code review |
|
Claude finished @RafaelPo's task in 1m 46s —— View job Code Review: CSV to JSON Storage Refactor + Stateless HTTP Mode
This PR has two logically distinct changes across 2 commits:
Both are well-implemented. Here's the detailed review: Looks goodJSON storage eliminates data loss — The core motivation (nested dict columns like Clean Redis key migration — Changing the key from Size limit correctly adjusted — Widget template fixes — The User-Agent ContextVar pattern — Clean propagation of HTTP User-Agent via Test coverage — Tests are thorough and correctly updated. The lifecycle integration tests in Suggestions1. CSV on-the-fly generation loads full JSON into memory — records: list[dict] = json.loads(json_text)
csv_text = pd.DataFrame(records).to_csv(index=False)This parses the full JSON, builds a DataFrame, and generates CSV all in memory. For a result near the 80M char limit, peak memory could be ~3-4x that (~240-320MB). The old code had a 2. Previously it was called per-page. Now it sanitizes all records upfront, which is correct (consistent data in Redis) but does mean iterating the full dataset on every store. For very large results this is a minor increase in CPU, but the simplicity is worth it. 3. Two concerns in one PR — The stateless HTTP mode (commit 4. Minor: _NO_WIDGET_UA_SUBSTRINGS = {"claude-code"}This set literal is reconstructed on every invocation of VerdictClean, well-motivated refactor with good test coverage. The JSON storage path is simpler and more correct than the CSV path it replaces. The stateless HTTP changes are well-documented in the ADR. Ship it. |
Allows spinning up N mcp-server replicas behind round-robin nginx to verify stateless HTTP mode locally before deploying to GKE. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The overlay needs unauthenticated mode for local testing since there is no OAuth provider. ALLOW_NO_AUTH=1 also allows binding 0.0.0.0 inside the container so nginx can reach the backends. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
@claude code review |
|
Claude finished @RafaelPo's task in 2m 20s —— View job Code Review: CSV to JSON Storage Refactor + Stateless HTTP Mode
Two logically independent changes across 4 commits: (1) stateless HTTP mode with User-Agent detection, (2) CSV → JSON storage in Redis. Both are well-implemented. Looks goodJSON storage eliminates data loss — The core motivation (nested dict columns like Clean Redis key migration — Switching from Sanitize-once-at-store pattern — ContextVar for User-Agent — Clean propagation pattern in Widget template fixes — Deploy infra is defense-in-depth — Session affinity (GCPBackendPolicy + ClientIP) is correctly guarded by Test coverage — Thorough. All store/retrieve tests migrated from CSV to JSON. Integration tests in Issues1. Stale docstring — """REST endpoint to download task results as CSV.This endpoint now serves both CSV and JSON formats. Minor, but will confuse future readers. 2. JSON download missing CSV downloads include 3. CSV on-the-fly generation has no size guard — records: list[dict] = json.loads(json_text)
csv_text = pd.DataFrame(records).to_csv(index=False)The old code had an explicit VerdictClean, well-motivated refactor. The JSON path is both simpler and more correct than the CSV path it replaced. Ship it — the issues above are non-blocking. |
Address code review feedback: - Update api_download docstring to mention both CSV and JSON formats - Add Content-Disposition header to JSON responses so browsers trigger a save dialog instead of rendering inline Co-Authored-By: Claude Opus 4.6 <[email protected]>
Implementation details are captured in code comments and the multi-replica-test skill. The standalone doc is no longer needed. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Summary
research)_restore_dict_columns()workaround andresearch._allwidget fallback — no longer neededTest plan
uv run pytest tests/)🤖 Generated with Claude Code