Skip to content

ruc-ai4math/LeanStateSearch

Repository files navigation

Lean State Search

arXiv Hugging Face Try Demo

🔥 News

  • 2025/04/05 Update: Self-host Lean State Search is available!
  • 2025/03/07 Update: LeanSearchClient now supports query Lean State Search from within Lean.
  • 2025/03/05 Update: Lean State Search is now publicly available!

🧩 About

Lean State Search is an innovative search engine powered by a pretrained model, specifically designed to help mathematicians and Lean4 users efficiently search Mathlib theorems using proof states. It is developed by the AI4Math team at Renmin University of China.

This repository contains the frontend and backend code for the application. You can setup your own server for deployment of Lean State Search.

⚒️ Installation

This project uses Nix for development dependency management. We recommend installing nix using the script provided by zero-to-nix.

After installing nix, run the following command to enter development environment.

nix develop

You should first download our model from huggingface. Run the following command:

git lfs install
git clone https://huggingface.co/ruc-ai4math/LeanStateSearch2025.3

You also need to fill in the necessary environment variables in the .env file. First, copy the .env.example to .env:

cp .env.example .env

Here is an example project structure:

LeanStateSearch
├── LeanStateSearch2025.3
├── data
├── protos
├── scripts
├── lean-state-search
├── state-search-be
├── Makefile
├── lib.nix
├── flake.lock
├── compose.yaml
├── flake.nix
├── shell.nix
├── .env
└── README.md

The MODEL_NAME_OR_PATH environment variable should be the absolute path to the model.

We recommend using docker for deployment. Run the following command to build the images of frontend and backend:

make build-images

Then run the command to start the service:

make init-service

After starting the basic services, you need to prepare the data yourself. We provide premises extracted from v4.16.0 of Mathlib as an example. You can use existing tools like ntp-toolkit to extract data. Run the following command to upload data to the postgresql database.

cd state-search-be
poetry run python scripts/upload_theorem.py --data-path ../data/premise4.16.0.jsonl --rev v4.16.0

Then run the following command to create vector store. It will take some time depending on the hardware of your computer.

# In state-search-be/
poetry run python scripts/create_vector_store.py --rev v4.16.0

Now your application can work properly in http://localhost:${FRONTEND_PORT}.

📖 References

Training from scratch

The code for training the model from scratch and reproducing the experimental results of our paper can be found here.

Call from API

The application can be accessed through API requests. You can refer to LeanSearchClient as an example. The format is as follows:

https://localhost:${FRONTEND_PORT}/api/search?query=${Your query}&results=${result number}&rev=${revision}

Contribute to this project

Comming soon.

📌 Citation

@misc{tao2025assistingmathematicalformalizationlearningbased,
      title={Assisting Mathematical Formalization with A Learning-based Premise Retriever},
      author={Yicheng Tao and Haotian Liu and Shanwen Wang and Hongteng Xu},
      year={2025},
      eprint={2501.13959},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.13959},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published