Skip to content

Samuel-Scalbert/SOFTware-Sync

Repository files navigation

SOFTWARE-SYNC

logo_full_HUB

Efficient Python Tools for Enhanced Software Sync

license last-commit repo-top-language repo-language-count

Capture d’écran du 2024-06-03 16-39-41

Overview

Purpose:

The SOFTware-Sync application processes sets of XML or PDF files generated by GROBID, alongside JSON result files from SOFTCITE, to produce either enhanced XML files or CSV files summarizing software mentions.

Input File Types:

GROBID Outputs: XML or PDF files containing structured information extracted from scholarly documents. SOFTCITE Outputs: JSON files containing results of software citation detection.

Output Options:

Enhanced XML Files: XML files augmented with details of every software mentioned in the input documents. CSV Summary Files: CSV files listing every software mention detected across the input documents, along with relevant metadata.---

Installation

From source

  1. Clone the SOFTware-Sync repository:
git clone https://github.com/Samuel-Scalbert/SOFTware-Sync
  1. Change to the project directory:
cd SOFTware-Sync
  1. Install the dependencies:
pip install -r requirements.txt

Features

Available options for SOFTware-Sync:

    1. --enhance-dir : Enhance multiple XML files in a directory by associating them with corresponding JSON files.
       Usage: python main.py --enhance-dir <dir_xml_path> <dir_json_path>

    2. --enhance-file : Enhance a single XML file by associating it with a JSON file.
       Usage: python main.py --enhance-file <xml_path> <json_path>
       
       options available : "--project" / "--only-mention" 

    3. --builder : Build XML files by combining Grobid TEI XML and metadata XML files.
       Usage: python main.py --builder <xml_path_grobid> <xml_path_meta>

    4. --check-XML-META : Check the number of XML files available against the number of metadata XML files.
       Usage: python main.py --check-XML-META <xml_path_grobid> <xml_path_meta>

    5. --check-XML-JSON : Check the number of XML files available against the number of JSON files.
       Usage: python main.py --check-XML-JSON <xml_path> <json_path>

    6. --csv-creator : Create a csv to display the number of mentions and its occurrences of a software.
       Usage: python main.py --csv-creator <json_path>

    7. --mentions-checker : Check for empty JSON mentions files.
       Usage: python main.py --mentions-checker <json_path>

    8. --download-halid : Download files from Hal ID.
       Usage: python main.py --download-halid <csv_path>

    9. --help, -h : Display this message.
       Usage: python main.py --help

Usage

From source

Run SOFTware-Sync using the command below:

python main.py

Run SOFTware-Sync to enhance multiple XML files in a directory by associating them with corresponding JSON files:

python main.py --enhance-dir ../data/XML/ ../data/JSON/

License

This project is protected under the SELECT-A-LICENSE License. For more details, refer to the LICENSE file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published