This project automates the process of linking arXiv papers with their companion code repositories and updating the MaRDI Knowledge Graph (KG). It combines metadata from PapersWithCode and updates MaRDI KG items using the MaRDI Client.
It is implemented as a Prefect workflow and can be run standalone, on a local Prefect server or on the Prefect Cloud.
- Downloads the latest PapersWithCode JSON dump
- Searches for the corresponding arXiv entries in the MaRDI KG
- Updates matching MaRDI KG items with the companion code repository information
- Clone the Git repository
- Create a virtual environment
- Install the dependencies:
pip install -r requirements.txt
- Optional: LakeFS instance that stores the local database between runs
- Create secrets file (see below)
- Run
python workflow_main.py
Hint: The Prefect server is automatically installed inside your virtual Python environment when you installed the dependencies.
- Connect the server to your local environment:
prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api
- Start the server:
prefect server start
- Create secrets at the Prefect server (ONLY ONCE) using Block secrets
- Deploy the workflow using:
python workflow_deploy_local.py
- Run the workflow either from the CLI:
prefect deployment run 'process-papers/process_papers'
- Or using the web ui -> Deployments -> Run the workflow
- Create an account at the Prefect Cloud
- Create a WorkPool in the cloud web ui
- Create an API key in the cloud web ui
- Connect your local environment (within your virtual Python environment):
prefect cloud login -k APIKEY
prefect cloud login
- Create secrets at the Prefect server using Block secrets
- Run
python workflow_deploy_cloud.py
- Go to the web ui -> Deployments -> Run the workflow
You need to have these key/value pairs, either in a local secrets.config file (for local execution) or as Block secrets at the Prefect server (for server based execution):
- mardi-kg-user=xxx
- mardi-kg-password=xxx
- lakefs-user=xxx
- lakefs-password=xxx
Again, the lakeFS configuration is optional.