Datatractor Yard: Metadata Extractor Registry

A place to develop and discuss the Datatractor Yard (formerly the MaRDA Extractors WG registry). The idea is to collect various file formats used in materials science and chemistry, describe them with metadata, and provide links to software projects that can parse them.

By providing this data in a web API, it hoped that users can discover new extractors more easily and metadata standards can be developed for the output of extractors to enable schemas to proliferate throughout the field.

The state of the main branch is deployed to https://yard.datatractor.org/, with API docs (and built-in client) accessible at https://yard.datatractor.org/redoc.

For more information, see the preprint:

Datatractor: Metadata, automation, and registries for extractor interoperability in the chemical and materials sciences
Matthew L. Evans, Gian-Marco Rignanese, David Elbert & Peter Kraus
arXiv:2410.18839 (2024)

Contributing

You are welcome to contribute file type and extractor entries to this registry, by opening a pull request. Please see the contributing guidelines for detailed steps. After submitting a pull request, this data will be validated and added to the deployed database once it is merged.

Development

Clone repository with submodules and install deps in a fresh Python virtualenv:

git clone [email protected]:datatractor/yard --recurse-submodules
pip install -r requirements.txt

Use invoke and the tasks in tasks.py to generate pydantic models for all schemas defined in the schema repo:

invoke regenerate-models

From the repository root directory, launch the server with uvicorn:

uvicorn yard.app:app

then navigate to http://localhost:5000 to test.

Deployment

The registry app can be easily deployed via the given Dockerfile. After cloning the repository (with submodules, following the instructions above), the image can be built for a given schema version by running

docker build . -t datatractor-yard

and then launched with

docker run -p 8080 --env PORT=8080 datatractor-yard

or equivalent command.

Registry Maintainers

Matthew Evans, @ml-evs
Peter Kraus, @PeterKraus

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github		.github
schemas @ 855e927		schemas @ 855e927
yard		yard
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
tasks.py		tasks.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Datatractor Yard: Metadata Extractor Registry

Contributing

Development

Deployment

Registry Maintainers

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

datatractor/yard

Folders and files

Latest commit

History

Repository files navigation

Datatractor Yard: Metadata Extractor Registry

Contributing

Development

Deployment

Registry Maintainers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages