This repository contains the code for the paper: https://arxiv.org/abs/2505.14479.
Authors: Oren Sultan, Eitan Stern, Dafna Shahaf, The Hebrew University of Jerusalem, Israel.
The code is implemented in Python 3.10.
To get started:
conda env create -f environment.yml
conda activate FormalGeo_env -
formalgeo/create_problems_proofs_similarity_dataset.pyThe creation of the abstracted dataset (Section 3.1) -
formalgeo/similar_proofs_model.pyThe regressor model training (Section 3.2). Run with argument --run-model-pred 1 to run the model. -
formalgeo/similar_proofs_retrieval.pyRetrieve analogies for our analogy-based method, or random problems for the base model baseline (see Section 4). This code is used in LLM_analogy_based_solver.py. -
formalgeo/LLM_analogy_based_solver.pyRun the LLM solver, default base model is o1 (set in argument "model"). (whether with analogy-based / base model settings, should be set in "variant" argument). You should set your key for OpenAI model. You should set the problem you want to solve in chosen_problems_by_level. The output proof file and log will be generate in /results folder -
formalgeo/geometric_verifier.pyOur SMT verifier based on z3 library (see Section 3.4) -
formalgeo/prompt/geometry_similar_problems_prompt.txtThe prompt given to the LLM solver (see Figure 6) -
formalgeo/verifier/verifier.pyOur code for the verifier to support tier 1 errors, based on FormalGeo codebase (see Section 3.4) -
formalgeo/experiments/exp_new.pyOur code to run the plots for the results of the experiments (see Section 5)
This directory contains the FormalGeo7K dataset:
-
formalgeo7k_v1/gdl/theorem_GDL.json
The Geometry Definition Language (GDL) theorem dictionary used across the dataset. -
formalgeo7k_v1/problems/
A collection of problem JSON files, each representing an individual problem in the FormalGeo7K dataset.
This directory contains:
-
Evaluation Data
Subfolders grouped by difficulty level (level_1tolevel_5), each containing problems and their corresponding proof outputs evaluated in our experiments. -
Figures Included in the Paper
correct_proofs_ablation.png– Figure 3: Ablation study on correct proofsretries_and_runs.png– Figure 4: Effect of retries and multiple runstier_error_distribution.png– Figure 5: Distribution of errors across difficulty tiersanalogy_stability.png– Figure 9: Stability analysis of the analogy-based method
You should manually overwrite the following files in your conda environment:
-
Replace
~/miniforge3/envs/FormalGeo_env/lib/python3.10/site-packages/formalgeo/solver/interactive.py
withsrc/formalgeo/solver/interactive.py -
Replace
~/miniforge3/envs/FormalGeo_env/lib/python3.10/site-packages/formalgeo/tools/output.py
withsrc/formalgeo/tools/output.py
This ensures that the solver and output behavior matches the modified logic described in the paper and used in our experiments.
@article{sultan2025towards, title={Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach}, author={Sultan, Oren and Stern, Eitan and Shahaf, Dafna}, journal={arXiv preprint arXiv:2505.14479}, year={2025} }
For inquiries, please send an email to [email protected].