Second Batch Announcement
This document (PDF) describes our plan for a second batch of problems, which will be created, tested, and graded from March to June 2026. This batch will be designed as a formal benchmark. It will also include a separate round of informal community experimentation, in which a set of problems is made available to the interested public, with solutions provided after a few days, followed by an open discussion.
Timeline
Problem Selection
March – May 2026
Mathematicians across fields submit unpublished problems with proofs of at most 8 pages. All submissions undergo a first round of refereeing, and 10 problems are selected for the benchmark plus a separate set for community experimentation.
Benchmark Testing
Late May – Early June 2026
The editorial board tests leading AI systems via API. Each system gets one shot per question with no additional interaction. Additional systems meeting performance and autonomy criteria may be included, subject to availability of funds and to logistical constraints.
Benchmark Grading
June 2026
Human mathematicians referee AI solutions blind, rating each as essentially flawless, publishable with minor revisions, requiring major revisions, or rejected. Results — including referee reports and editorial reasoning — will be published online.
Informal Community Experimentation
June 2026
A separate set of problems is released to the public after the formal benchmark results. Solutions are posted after a few days, followed by open discussion on Zulip.
Participate in the Benchmark
Commercial or non-commercial entities with AI systems demonstrating performance comparable to leading public models may apply to have their systems included in testing. Logistical constraints will limit the number of such submissions that we can include in this batch. See Section 3 of the March 14 announcement for details.
Deadline to express interest: April 14, 2026. Contact [email protected].
If you are a mathematician interested in contributing problems to the second batch, please reach out at [email protected].