Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.
We added APEX-Agents, ARC-AGI-2, and HLE to the Epoch Capabilities Index. GPT-5.4 Pro now leads, narrowly ahead of Gemini 3.1 Pro.
GPT-5.4 Pro set a new record on FrontierMath, scoring 50% on Tiers 1–3 and 38% on Tier 4. We also evaluated it on FrontierMath: Open Problems.
We released FrontierMath: Open Problems, which tests AI on unsolved math research problems.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation