applect-data-project

This repository includes scripts supporting the AppleCT Datasets manuscript, henceforth referred to as the submission.

AppleCT Datasets

In the submission, we present three parallel-beam tomographic datasets of 94 apples with internal defects along with defect label information. The datasets are prepared for development and testing of data-driven, learning-based image reconstruction, segmentation and post-processing methods.

The three versions are a noiseless simulation (Dataset A); simulation with added Gaussian noise (Dataset B), and with X-ray scattering (Dataset C). The datasets are based on real 3D X-ray CT data and their subsequent volume reconstructions. The ground truth images, generated based on the volume reconstructions, are also available through this project, see the links in Dataset access.

Apples contain various defects, and if ignored, the data split may naturally introduce label bias. We reduce label bias by formulating the data split as an optimization problem, which we solve using two methods: a simple heuristic algorithm and through mixed integer quadratic programming. This ensures the datasets can be split into test, training or validation subsets with the label bias eliminated. We refer the readers to the submission for the technical details.

The datasets are suitable for image reconstruction, segmentation, automatic defect detection, and studying the effects (as well as testing the elimination) of label bias in machine learning.

Codes

Scripts are arranged into subfolders depending on their use.

Dataset_Generation: This folder contains scripts for generating Datasets A and B.
Scattering: This folder contains scripts for generating Dataset C.
bias_elimination: This contains the two bias elimination algorithms introduced in the submission, both in Python and MATLAB syntax. The input files apple_defects_full.csv and apple_defects_partial.csv are available via the Zenodo page for the Datasets A-C (see link below).
Technical_Validation: This contains the scripts used to obtain the results in Technical Validation in the submission.

Dataset access

The datasets and the ground truth reconstructions are made available via Zenodo.

Contributions or suggestions

External contributions to the codes are currently not allowed. However, we welcome any feedback or suggestions for making the scripts more user-friendly, please email the corresponding authors listed in the submission.

Publication and licensing

The submission is currently under review for Nature Scientific Data. A second version of the submission will be added to Arxiv shortly. When referring to these codes, please cite to the latest version of the submission on Arxiv.

The codes are released under the MIT License; the datasets under Creative Commons Attribution 4.0 International.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Dataset_Generation		Dataset_Generation
Scattering		Scattering
Technical_Validation		Technical_Validation
bias_elimination		bias_elimination
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

applect-data-project

AppleCT Datasets

Codes

Dataset access

Contributions or suggestions

Publication and licensing

About

Releases

Packages

Contributors 2

Languages

License

cicwi/applect-dataset-project

Folders and files

Latest commit

History

Repository files navigation

applect-data-project

AppleCT Datasets

Codes

Dataset access

Contributions or suggestions

Publication and licensing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages