Frontier-CS blog posts

Learning a Continual Learning AI Researcher on Frontier CS

What if AI could research how to improve itself from every experiment it ever ran? We use ALMA to automatically learn continual learning AI researchers on Frontier CS to explore this problem.

7 min read · March 17, 2026

2026
Evaluating Evolving Agent Systems at Scale with Frontier-CS

Evolving agent systems are advancing fast, but evaluation hasn't kept up. We show how Frontier-CS enables comprehensive, large-scale benchmarking of evolving agents—moving beyond small case studies for comparison at scale.

12 min read · March 10, 2026

2026
LLM Defeated in Open-ended Problems

Modern LLMs claim superhuman algorithmic abilities, but what happens when there is no strict verifier? We analyze how multi-turn 'optimization' in Frontier-CS exposes the cognitive ceiling and catastrophic failures of AI in open-ended problem solving.

6 min read · February 26, 2026

2026
Evaluating the Hardest CS Problems in the Age of LLMs

Frontier-CS scores solutions on a continuous scale across heterogeneous hardware. This post explains the evaluation architecture behind the leaderboard: hash-based resume, resource-grouped clusters, pinned environments, and the challenges ahead for agentic submissions.

13 min read · February 10, 2026

2026
Frontier-CS 1.0 Release

We are releasing Frontier-CS 1.0, a major update to our open-ended Computer Science benchmark. This release expands Frontier-CS to 240 tasks across both the algorithmic and research tracks. We also introduce a new Elo-based leaderboard, along with full execution traces of model solutions to enable deeper analysis and reproducibility.

5 min read · February 3, 2026

2026

Frontier-CS blog posts

Frontier-CS blog posts

Learning a Continual Learning AI Researcher on Frontier CS

Evaluating Evolving Agent Systems at Scale with Frontier-CS

LLM Defeated in Open-ended Problems

Evaluating the Hardest CS Problems in the Age of LLMs

Frontier-CS 1.0 Release