feat: add agent-based solution generation via Claude Agent SDK by andylizf · Pull Request #104 · FrontierCS/Frontier-CS

andylizf · 2026-04-16T05:18:02Z

Summary

Add agent-based solution generation pipeline using Claude Agent SDK
Agent models identified by -agent suffix (e.g., claude-sonnet-4-5-agent)
Integrated into existing generate_solutions.py — same CLI, just pass an agent model name
Agent gets problem statement only, must self-test (no test data, no checker, no interactor)

Files

src/frontier_cs/gen/agent_interface.py — core agent lifecycle: prompt construction, SDK invocation, streaming, transcript logging, timeout/cost control, solution extraction
src/frontier_cs/gen/agent_constants.py — prompt templates, helper shell scripts, CLAUDE.md content
src/frontier_cs/models.py — -agent model suffix handling in prefix/provider detection
algorithmic/scripts/generate_solutions.py — agent mode integration
tests/test_agent_interface.py — 18 tests

Test plan

pytest tests/test_agent_interface.py — 18/18 pass
End-to-end run on a few problems with actual agent

…raction

Add agent model support to the solution generation pipeline: - Detect -agent suffix models and store problem_dir in GenerationTask - Add --agent-timeout and --agent-cost-limit CLI arguments - Branch execute_task to call generate_agent_solution for agent models - Save .meta.json alongside generated .cpp solutions - Add import json for metadata serialization

- Copy problem dir to temp directory so agent doesn't pollute originals - Makes concurrent runs on same problem safe - Track token usage from streaming message_delta events (only reliable source when timeout kills run before ResultMessage arrives) - Clean up temp dir after extraction

… for agent eval Build dynamic agent prompts from problem config (time/memory limits, subtask counts, interactive vs standard). Write test_all.sh and run_interactive.sh into agent workdir. Embed small sample I/O directly in prompt. Add CLAUDE.md with solving strategy guidance.

Parity mode (--parity flag) strips all test data, helper scripts, checker, and interactor from the agent workspace — matching the Harbor adapter setup where agents must self-test via brute-force cross-validation (对拍). Changes: - agent_interface.py: parity-aware prompt, workspace setup, CLAUDE.md, _get_infra_git_hash(), and enriched build_metadata (timestamp, parity flag) - generate_solutions.py: --parity CLI argument - tests: parity prompt validation (standard + interactive) - docs: solutions repo separation plan (infra_git_hash in meta.json) - .gitignore: exclude .claude/ directory - pyproject.toml: add pytest dev dependency

These belong to the solutions repo separation effort, which is docs-only for now. Removed _get_infra_git_hash(), subprocess import, and the infra_git_hash/timestamp/parity fields from build_metadata().

…n doc Agent always runs without test data — no --parity flag needed. The solutions repo separation plan is not ready to commit.

Move all large string constants (prompt templates, shell scripts, CLAUDE.md content) out of agent_interface.py into a dedicated constants module.

Prompt (initial message) is now lean — only problem-specific info (path, type, limits). CLAUDE.md carries persistent guidance that survives context compaction: self-testing methodology, workflow steps, common mistakes, retreat strategy.

…gment room

andylizf added 10 commits April 6, 2026 11:40

feat: add claude-agent-sdk dependency for agent eval

509f5cb

feat: handle -agent model suffix in model prefix and provider detection

c8ee4aa

feat: add agent_interface.py — core agent runner with logging and ext…

f208dfa

…raction

revert: remove infra_git_hash and timestamp from build_metadata

dd0a633

These belong to the solutions repo separation effort, which is docs-only for now. Removed _get_infra_git_hash(), subprocess import, and the infra_git_hash/timestamp/parity fields from build_metadata().

fix: make parity mode the default and remove solutions-repo-separatio…

e95fb8f

…n doc Agent always runs without test data — no --parity flag needed. The solutions repo separation plan is not ready to commit.

refactor: extract prompt templates and scripts to agent_constants.py

1549c53

Move all large string constants (prompt templates, shell scripts, CLAUDE.md content) out of agent_interface.py into a dedicated constants module.

andylizf changed the title ~~feat: agent eval with parity mode for Harbor alignment~~ feat: add agent-based solution generation via Claude Agent SDK Apr 16, 2026

andylizf added 2 commits April 16, 2026 17:08

refactor: soften scoring and retreat guidance to give agents more jud…

b31db0a

…gment room

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add agent-based solution generation via Claude Agent SDK#104

feat: add agent-based solution generation via Claude Agent SDK#104
andylizf wants to merge 12 commits intomainfrom
feat/agent-eval-algorithmic

andylizf commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andylizf commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andylizf commented Apr 16, 2026 •

edited

Loading