REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

Ziju Shen, Naohao Huang, Fanyi Yang, Yutong Wang, Guoxiong Gao, Tianyi Xu, Jiedong Jiang, Wanyi He, Pu Yang, Mengzhou Sun, Haocheng Ju, Peihao Wu, Bryan Dai, Bin Dong
arXiv May 27, 2025

Abstract

Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we construct a comprehensive dataset of 1.5M formal-informal proof pairs from Mathlib4, covering diverse mathematical domains. We also develop a novel retrieval-augmented generation (RAG) framework that enhances the prover's ability to handle complex mathematical reasoning by retrieving relevant theorems and lemmas. Our experimental results demonstrate that REAL-Prover significantly outperforms existing theorem provers on college-level mathematics problems, achieving state-of-the-art performance on multiple benchmarks.