Best Practices for Releasing Experiment Artifacts

Introduction

The intent of this document is to provide some recommendations for releasing experiment artifacts to aid the research community.

Templates

As an attempt to put discussion into action, the templates directory contains a base README.md template that will aid the discovery and ingestion of experimental artifacts.

Please check the README.md template for your artifact repositories.

Best Practice Recommendations

1. Specification of Dependencies

Any pre-requisite dependencies that your experiment has should be listed to help aid in the reproducibility of your experiment. Following are some recommendations to consider :

You should likely assume little if any background on the technology that your artifact relies on to best increase the likelihood of re-use.
Attempt to be clear and concise with your instructions to make sure that entities looking to re-use your artifacts can do so without your assistance.
If feasible, try to provide an entirely re-producible version of your artifact by leveraging systems such as docker to provide a docker image in Dockerhub and Dockerfile for re-creation.

2. Artifacts

The artifacts that you want to share should be included in the form that they are most useful in. Consider items like the following as you share them with other researchers:

Completeness of the artifacts. Make sure to include everything that is necessary for your experiment validation.
It is most useful if artifacts are shared in a common or well-known format to aid in other research re-use. While proprietary formats may sometimes be required, any tools that aid that access to that data should be pointed out to lower the learning curve for potential use.

Types of Artifacts

Consider including artifacts like the following :

Datasets
Code
Configurations
Experiment setup tools
System images (e.g., docker images)
Testbed specific initiation scripts
Publications related to this artifact

3. Evaluation Code

Experiments may have subtleties that were not possible to explain in the research paper due to space constraints. Including the code you used for setup, collection, reformatting, or analysis along with a description of what that code is and where it is used in the experimental pipeline can be extremely helpful to your fellow researchers.

4. Other Items to Consider

If you are using specific scientific disciplines for your work, it may be useful to include additional items. For example:

For ML-based efforts, consider checking out the ML recommendations here.

5. Specific Results

If can be useful to provide some of your key results as part of the sharing of your artifacts. Consider including a table of those results, graphs showing those results, along with the specific commands utilized to generate those results from the shared artifacts.

Useful Resources

These recommendations come from the ML recommendations mentioned above. Modified to suit the needs of this document.

Hosting Files

Zenodo - versioning, 50GB, free bandwidth, DOI, provides long-term preservation
GitHub Releases - versioning, 2GB file limit, free bandwidth
OneDrive - versioning, 2GB (free)/ 1TB (with Office 365), free bandwidth
Google Drive - versioning, 15GB, free bandwidth
Dropbox - versioning, 2GB (paid unlimited), free bandwidth
AWS S3 - versioning, paid only, paid bandwidth
DAGsHub - a way to track experiments, version data, models & pipelines, using Git

Managing Files

RClone - provides unified access to many different cloud storage providers
dvc - open-source version control system designed for machine learning projects

Making Project Pages

Making Demos and Tutorials

Contributing

If you'd like to contribute to these best practices please open an issue on this GitHub repository or submit a pull request. Also, please check out the NSF SEARCCH project.

All content in this repository is licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
templates		templates
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Best Practices for Releasing Experiment Artifacts

Introduction

Templates

Best Practice Recommendations

1. Specification of Dependencies

2. Artifacts

Types of Artifacts

3. Evaluation Code

4. Other Items to Consider

5. Specific Results

Useful Resources

Hosting Files

Managing Files

Making Project Pages

Making Demos and Tutorials

Contributing

About

Releases

Packages

License

ITI/releasing-research-artifacts

Folders and files

Latest commit

History

Repository files navigation

Best Practices for Releasing Experiment Artifacts

Introduction

Templates

Best Practice Recommendations

1. Specification of Dependencies

2. Artifacts

Types of Artifacts

3. Evaluation Code

4. Other Items to Consider

5. Specific Results

Useful Resources

Hosting Files

Managing Files

Making Project Pages

Making Demos and Tutorials

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages