Skip to content

Create efficient datasets for training Multimodal Foundation Models

License

Notifications You must be signed in to change notification settings

aleSuglia/emma-datasets

 
 

Repository files navigation

EMMA: Datasets

Python 3.9 PyTorch Poetry
pre-commit style: black wemake-python-styleguide

Continuous Integration Tests

Important

If you have questions or find bugs or anything, you can contact us in our organisation's discussion.


To use this package in your project, you can install it by running

poetry add git+https://github.com/emma-simbot/datasets.git

You can then just import from emma_datasets or run commands using the CLI with

python -m emma_datasets

Writing code and running things

When running commands for emma_datasets, you can append --help to get more information on the commands and any arguments available to you.

Project structure

This is organised in very similarly to structure from the Lightning-Hydra-Template to facilitate reproducible research code.

  • scriptssh scripts to run experiments
  • notebooks — Jupyter notebook for analysis and exploration
  • storage — data for training/inference (and maybe use symlinks to point to other parts of the file system)
  • testspytest scripts to verify the code
  • src/emma_datasets — where the main code lives

How-to guides

For more detail on how to use this library, check out the following specific pages on:

About

Create efficient datasets for training Multimodal Foundation Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • JavaScript 0.2%