Refactor result storage from CSV to JSON in Redis by RafaelPo · Pull Request #237 · futuresearch/futuresearch-python

RafaelPo · 2026-02-26T15:47:17Z

Summary

Switches Redis result storage from CSV to JSON, eliminating the lossy CSV round-trip that destroyed nested dict columns (like research)
JSON download path now returns raw string from Redis (zero parsing); CSV downloads generated on-the-fly
Removes _restore_dict_columns() workaround and research._all widget fallback — no longer needed

Test plan

All 372 tests pass (uv run pytest tests/)
Deploy to staging, run a task from Claude.ai — verify widget renders and research popover shows clean text
Verify CSV download still works (generated on-the-fly from JSON)

🤖 Generated with Claude Code

Enable stateless_http=True so the MCP server holds no in-memory sessions, allowing horizontal scaling without session affinity. Replace clientInfo-based widget detection with a User-Agent ContextVar fallback (claude-code blocklist). Add GCPBackendPolicy + ClientIP session affinity as defense-in-depth. Co-Authored-By: Claude Opus 4.6 <[email protected]>

CSV round-trip (df.to_csv → Redis → pd.read_csv) destroyed nested dict columns like `research`, requiring fragile workarounds (_restore_dict_columns with ast.literal_eval). Switching to JSON storage eliminates the problem entirely and makes the JSON download path faster (no CSV→DataFrame→JSON conversion needed). - Rename store_result_csv/get_result_csv/result_csv_exists → JSON equivalents - Redis key pattern: result:{id}:csv → result:{id}:json - Store via df.to_dict(orient="records") → json.dumps() directly - JSON download returns raw string from Redis (zero parsing) - CSV download generated on-the-fly from stored JSON - Remove _restore_dict_columns() and research._all widget fallback - Bump cache size limit from 50MB to 80MB (JSON is ~1.5x larger) Co-Authored-By: Claude Opus 4.6 <[email protected]>

RafaelPo · 2026-02-26T16:24:32Z

@claude code review

github-actions · 2026-02-26T16:24:51Z

Claude finished @RafaelPo's task in 1m 46s —— View job

Code Review: CSV to JSON Storage Refactor + Stateless HTTP Mode

Read all changed source files
Read all changed test files
Analyze changes and provide review feedback

This PR has two logically distinct changes across 2 commits:

Stateless HTTP mode with User-Agent detection for multi-replica scaling
CSV → JSON storage in Redis, eliminating the lossy CSV round-trip

Both are well-implemented. Here's the detailed review:

Looks good

JSON storage eliminates data loss — The core motivation (nested dict columns like research getting destroyed by CSV round-trip) is well addressed. Storing JSON natively in Redis means zero parsing on the JSON download path (routes.py:303-308) and faithful preservation of nested structures. The new test_research_dict_survives_json_roundtrip test validates this directly.

Clean Redis key migration — Changing the key from result:{task_id}:csv to result:{task_id}:json means old CSV data won't be accidentally read as JSON after deployment. In-flight results stored under the old key will simply 404 and fall back to the API — correct fail-safe behavior given the 1-hour TTL.

Size limit correctly adjusted — MAX_JSON_CACHE_CHARS increased from 50M → 80M to account for JSON being ~1.5x larger than CSV for the same data. Comment explains the rationale.

Widget template fixes — The flatWithResearch changes in templates.py correctly handle the fact that research is now a real object (not a flattened CSV string): the delete display.research prevents [object Object] from appearing in the table, and the column filter addition (k!=="research") hides the raw nested column.

User-Agent ContextVar pattern — Clean propagation of HTTP User-Agent via contextvars.ContextVar, following the same pattern as the MCP SDK's auth_context_var. The three-tier widget detection is well-reasoned and documented.

Test coverage — Tests are thorough and correctly updated. The lifecycle integration tests in test_http_integration.py cover the full mint→download flow.

Suggestions

1. CSV on-the-fly generation loads full JSON into memory — routes.py:311-312

records: list[dict] = json.loads(json_text)
csv_text = pd.DataFrame(records).to_csv(index=False)

This parses the full JSON, builds a DataFrame, and generates CSV all in memory. For a result near the 80M char limit, peak memory could be ~3-4x that (~240-320MB). The old code had a max_json_size guard (removed in this PR) specifically for the reverse direction. The 80M cap on storage provides an upper bound, so this is likely fine in practice, but worth keeping an eye on in production. If memory becomes a concern, a streaming CSV writer over the JSON array would avoid the full materialization.

2. _sanitize_records is now called on the full dataset before slicing — result_store.py:281

Previously it was called per-page. Now it sanitizes all records upfront, which is correct (consistent data in Redis) but does mean iterating the full dataset on every store. For very large results this is a minor increase in CPU, but the simplicity is worth it.

3. Two concerns in one PR — The stateless HTTP mode (commit 3891f22) and the CSV→JSON refactor (commit 98f5eb9) are independent features. If either needs to be reverted, the other goes with it. Consider splitting into separate PRs in the future for cleaner rollback semantics. Not blocking for this PR since both are well-tested.

4. Minor: _NO_WIDGET_UA_SUBSTRINGS is rebuilt on every call — tool_helpers.py:180

_NO_WIDGET_UA_SUBSTRINGS = {"claude-code"}

This set literal is reconstructed on every invocation of _widgets_from_user_agent(). It's cheap for one element, but could be hoisted to module level if the set grows. Not actionable now.

Verdict

Clean, well-motivated refactor with good test coverage. The JSON storage path is simpler and more correct than the CSV path it replaces. The stateless HTTP changes are well-documented in the ADR. Ship it.
|

Allows spinning up N mcp-server replicas behind round-robin nginx to verify stateless HTTP mode locally before deploying to GKE. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The overlay needs unauthenticated mode for local testing since there is no OAuth provider. ALLOW_NO_AUTH=1 also allows binding 0.0.0.0 inside the container so nginx can reach the backends. Co-Authored-By: Claude Opus 4.6 <[email protected]>

RafaelPo · 2026-02-26T17:12:13Z

@claude code review

github-actions · 2026-02-26T17:12:31Z

Claude finished @RafaelPo's task in 2m 20s —— View job

Code Review: CSV to JSON Storage Refactor + Stateless HTTP Mode

Read all changed source files
Read all changed test files
Analyze deployment/infra changes
Provide review feedback

Two logically independent changes across 4 commits: (1) stateless HTTP mode with User-Agent detection, (2) CSV → JSON storage in Redis. Both are well-implemented.

Looks good

JSON storage eliminates data loss — The core motivation (nested dict columns like research getting destroyed by CSV round-trip) is cleanly addressed. Storing JSON natively means zero parsing on the download path (routes.py:302-308) and faithful preservation of nested structures. The test_research_dict_survives_json_roundtrip test validates the key invariant.

Clean Redis key migration — Switching from result:{task_id}:csv to result:{task_id}:json means old data won't be misinterpreted. In-flight results under the old key will 404 and fall back to the API — correct behavior given the 1-hour TTL.

Sanitize-once-at-store pattern — _sanitize_records is now called once in try_store_result (result_store.py:281) before writing to Redis, rather than per-page on reads. This is more correct (consistent data in Redis) and means try_cached_result can trust what it reads.

ContextVar for User-Agent — Clean propagation pattern in http_config.py:41-48 with proper reset() in the finally block (http_config.py:224-225). Follows the same design as the MCP SDK's auth_context_var.

Widget template fixes — flatWithResearch in templates.py:187-196 correctly handles research as a real object: delete display.research prevents [object Object] leaking into the flattened display, and the column filter k!=="research" at line 206 hides any remaining raw column. The != null guard (line 189) is the right check since it covers both null and undefined.

Deploy infra is defense-in-depth — Session affinity (GCPBackendPolicy + ClientIP) is correctly guarded by enabled flags and documented as optional since the server is stateless. The docker-compose.multi.yaml correctly inherits the backend network from the base compose file.

Test coverage — Thorough. All store/retrieve tests migrated from CSV to JSON. Integration tests in test_http_integration.py cover the full mint→download lifecycle.

Issues

1. Stale docstring — api_download still says "CSV" — routes.py:249

"""REST endpoint to download task results as CSV.

This endpoint now serves both CSV and JSON formats. Minor, but will confuse future readers.

Fix this →

2. JSON download missing Content-Disposition header — routes.py:304-308

CSV downloads include Content-Disposition: attachment; filename="results_xxx.csv" which triggers a browser save dialog. JSON downloads don't. If a user ever hits the JSON download URL directly in a browser, it'll render inline instead of downloading. Probably fine since the widget fetches JSON programmatically, but worth a deliberate decision.

3. CSV on-the-fly generation has no size guard — routes.py:311-312

records: list[dict] = json.loads(json_text)
csv_text = pd.DataFrame(records).to_csv(index=False)

The old code had an explicit max_json_size guard to prevent memory exhaustion during format conversion. This was removed because the direction flipped (JSON→CSV instead of CSV→JSON), but the concern is similar: for an 80M JSON string, json.loads + DataFrame + to_csv can peak at ~3-4x memory. The 80M storage cap provides an implicit upper bound (~320MB peak), which is within the 2Gi pod memory limit — so this is likely fine in practice, but worth noting. If CSV downloads of large results become common, a streaming CSV writer would avoid the full materialization.

Verdict

Clean, well-motivated refactor. The JSON path is both simpler and more correct than the CSV path it replaced. Ship it — the issues above are non-blocking.

Address code review feedback: - Update api_download docstring to mention both CSV and JSON formats - Add Content-Disposition header to JSON responses so browsers trigger a save dialog instead of rendering inline Co-Authored-By: Claude Opus 4.6 <[email protected]>

Implementation details are captured in code comments and the multi-replica-test skill. The standalone doc is no longer needed. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

RafaelPo and others added 2 commits February 26, 2026 15:12

RafaelPo and others added 2 commits February 26, 2026 16:33

Add nginx + docker-compose overlay for local multi-replica testing

35d98df

Allows spinning up N mcp-server replicas behind round-robin nginx to verify stateless HTTP mode locally before deploying to GKE. Co-Authored-By: Claude Opus 4.6 <[email protected]>

RafaelPo and others added 3 commits February 26, 2026 17:22

Remove stateless-http-scaling.md design doc

b9f4630

Implementation details are captured in code comments and the multi-replica-test skill. The standalone doc is no longer needed. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Clarify user_id comment in request logging middleware

55c76da

Co-Authored-By: Claude Opus 4.6 <[email protected]>

RafaelPo merged commit c73160f into main Feb 26, 2026
5 checks passed

RafaelPo deleted the refactor/csv-to-json-storage branch February 26, 2026 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor result storage from CSV to JSON in Redis#237

Refactor result storage from CSV to JSON in Redis#237
RafaelPo merged 7 commits intomainfrom
refactor/csv-to-json-storage

RafaelPo commented Feb 26, 2026

Uh oh!

RafaelPo commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

RafaelPo commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RafaelPo commented Feb 26, 2026

Summary

Test plan

Uh oh!

RafaelPo commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: CSV to JSON Storage Refactor + Stateless HTTP Mode

Looks good

Suggestions

Verdict

Uh oh!

RafaelPo commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: CSV to JSON Storage Refactor + Stateless HTTP Mode

Looks good

Issues

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 26, 2026 •

edited

Loading

github-actions bot commented Feb 26, 2026 •

edited

Loading