Skip to content

Official repository for the paper "Not as simple as we thought: a rigorous examination of data aggregation in materials informatics"

License

Notifications You must be signed in to change notification settings

fedeotto/data-aggregation-mi

Repository files navigation

DataAggregationMI

This repository contains the code and data accompanying our paper 'Not as simple as we thought: a rigorous examination of data aggregation in materials informatics' (https://pubs.rsc.org/en/content/articlelanding/2024/dd/d3dd00207a).

Installation

  1. Clone this repository:
    git clone https://github.com/FedeOtto/DataIntegrationMI
    
  2. Install a new conda environment from daggrmi_env.yml:
    conda env create -f daggrmi_env.yml
    
  3. Activate the new environment:
    conda activate daggrmi
    

Data

All the utilized datasets are stored in the datasets folder. MPDS data can be obtained by running the retrieve_mpds.py script, given that access to the API is provided. For more info visit https://mpds.io/developer/. Examples of data aggregation can still be reproduced using Materials Project (mp) and AFLOW (aflow) data assessed in this work.

Run experiments

The scripts present in this repository allow to reproduce specific figures and analysis outlined in the main paper. In particular:

  • 2_datasets.py reproduces Fig.1.
  • 1_imbalance.py reproduces Fig. 2.
  • 3_AB_augment.py reproduces the results presented in Table 2.
  • 6_self_augment.py reproduces the plots in Fig. 3.

About

Official repository for the paper "Not as simple as we thought: a rigorous examination of data aggregation in materials informatics"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages