2025/04/05
Update: Self-host Lean State Search is available!2025/03/07
Update: LeanSearchClient now supports query Lean State Search from within Lean.2025/03/05
Update: Lean State Search is now publicly available!
Lean State Search is an innovative search engine powered by a pretrained model, specifically designed to help mathematicians and Lean4 users efficiently search Mathlib theorems using proof states. It is developed by the AI4Math team at Renmin University of China.
This repository contains the frontend and backend code for the application. You can setup your own server for deployment of Lean State Search.
This project uses Nix for development dependency management. We recommend installing nix
using the script provided by zero-to-nix.
After installing nix
, run the following command to enter development environment.
nix develop
You should first download our model from huggingface. Run the following command:
git lfs install
git clone https://huggingface.co/ruc-ai4math/LeanStateSearch2025.3
You also need to fill in the necessary environment variables in the .env
file. First, copy the .env.example
to .env
:
cp .env.example .env
Here is an example project structure:
LeanStateSearch
├── LeanStateSearch2025.3
├── data
├── protos
├── scripts
├── lean-state-search
├── state-search-be
├── Makefile
├── lib.nix
├── flake.lock
├── compose.yaml
├── flake.nix
├── shell.nix
├── .env
└── README.md
The MODEL_NAME_OR_PATH
environment variable should be the absolute path to the model.
We recommend using docker for deployment. Run the following command to build the images of frontend and backend:
make build-images
Then run the command to start the service:
make init-service
After starting the basic services, you need to prepare the data yourself. We provide premises extracted from v4.16.0
of Mathlib as an example. You can use existing tools like ntp-toolkit to extract data. Run the following command to upload data to the postgresql database.
cd state-search-be
poetry run python scripts/upload_theorem.py --data-path ../data/premise4.16.0.jsonl --rev v4.16.0
Then run the following command to create vector store. It will take some time depending on the hardware of your computer.
# In state-search-be/
poetry run python scripts/create_vector_store.py --rev v4.16.0
Now your application can work properly in http://localhost:${FRONTEND_PORT}
.
The code for training the model from scratch and reproducing the experimental results of our paper can be found here.
The application can be accessed through API requests. You can refer to LeanSearchClient as an example. The format is as follows:
https://localhost:${FRONTEND_PORT}/api/search?query=${Your query}&results=${result number}&rev=${revision}
Comming soon.
@misc{tao2025assistingmathematicalformalizationlearningbased,
title={Assisting Mathematical Formalization with A Learning-based Premise Retriever},
author={Yicheng Tao and Haotian Liu and Shanwen Wang and Hongteng Xu},
year={2025},
eprint={2501.13959},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.13959},
}