Developed with the software and tools below.
🛑 This application is currently designed to interact with and harvest metadata from HAL linked to the database.
🛑 A lighter version of the application is under development, allowing anyone to create their own application without requiring a connection to HAL.
The process begins with a Database of PDF files. These PDFs are scholarly PDFs that need to be extracted and processed.
The PDFs are sent to GROBID, a tool used to extract structured data (like bibliographic information) from scholarly PDFs. GROBID processes the PDFs and outputs XML files. This is a crucial step in extracting machine-readable information from the documents.
After GROBID, the extracted data (likely enriched or supplemented data) is passed to SOFTCITE, which generates JSON outputs. SOFTCITE analyzes citations, software mentions, or related information in the PDF files like references.
The extracted data (XML and JSON) is then passed to SOFTware-Sync, which is a tool that synchronizes the data into one single XML.
SOFTware-Viz is responsible for visualizing the processed data. It likely takes the synchronized data from SOFTware-Sync and transforms it into visual outputs or dashboards.
The processed data is stored in ArangoDB, a multi-model NoSQL database, to manage both structured data. This database serves as the main storage for the extracted information/mentions.
Flask is a web framework used for developing web applications. Flask interacts with both SOFTware-Viz (for visualizations) and ArangoDB (for retrieving data).
- Clone the repository:
git clone ../
- Change to the project directory:
cd ./SOFTware-viz
- Create a virtualenv:
python -m venv env
- Install docker image
docker pull arangodb/arangodb:3.11.6
- Launch docker container
docker run -p 8529:8529 -e ARANGO_NO_AUTH=1 arangodb/arangodb:3.11.6
- Create the database "SOF-viz"
go to the port http://localhost:8529/ and create mannualy the database named "SOF-viz"
- Launch the virtualenv
source env/bin/activate
- Install the dependencies:
pip install -r requirement.txt
- Launch the app
python run.py
Run using the command below (the database will create itself only on the first launch):
python run.py
This project is protected under the SELECT-A-LICENSE License. For more details, refer to the LICENSE file.