Second Batch | First Proof Project

Second Batch Announcement

This document (PDF) describes our plan for a second batch of problems, which will be created, tested, and graded from March to June 2026. This batch will be designed as a formal benchmark. It will also include a separate round of informal community experimentation, in which a set of problems is made available to the interested public, with solutions provided after a few days, followed by an open discussion.

Timeline

Problem Selection

March – May 2026

Mathematicians across fields submit unpublished problems with proofs of at most 8 pages. All submissions undergo a first round of refereeing, and 10 problems are selected for the benchmark plus a separate set for community experimentation.

Benchmark Testing

Late May – Early June 2026

The editorial board tests leading AI systems via API. Each system gets one shot per question with no additional interaction. Additional systems meeting performance and autonomy criteria may be included, subject to availability of funds and to logistical constraints.

Benchmark Grading

June 2026

Human mathematicians referee AI solutions blind, rating each as essentially flawless, publishable with minor revisions, requiring major revisions, or rejected. Results — including referee reports and editorial reasoning — will be published online.

Informal Community Experimentation

June 2026

A separate set of problems is released to the public after the formal benchmark results. Solutions are posted after a few days, followed by open discussion on Zulip.

Participate in the Benchmark

Commercial or non-commercial entities with AI systems demonstrating performance comparable to leading public models may apply to have their systems included in testing. Logistical constraints will limit the number of such submissions that we can include in this batch. See Section 3 of the March 14 announcement for details.

Deadline to express interest: April 14, 2026. Contact [email protected].

If you are a mathematician interested in contributing problems to the second batch, please reach out at [email protected].