REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning
Abstract
Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we construct a comprehensive dataset of 1.5M formal-informal proof pairs from Mathlib4, covering diverse mathematical domains. We also develop a novel retrieval-augmented generation (RAG) framework that enhances the prover's ability to handle complex mathematical reasoning by retrieving relevant theorems and lemmas. Our experimental results demonstrate that REAL-Prover significantly outperforms existing theorem provers on college-level mathematics problems, achieving state-of-the-art performance on multiple benchmarks.