Tools to create torch datasets for the PLAsTiCC classification challenge
- Create a folder
- Run
./scripts/download_plasticc_from_zenodo.sh myfolder
TODO: Check md5sums! - Decompress the csv's, e.g.
7z x plasticc_train_metadta.csv.gz
If you plan to use the lazy (explanation below) generate light curve torch tensors with
python src/plasticc_create_lightcurves.py myfolder
Use the get_plasticc_datasets
function in src/plasticc_dataset_torch.py
to create a torch dataset from the decompressed csv's
You can choose between
- Eager loading: The dataset has all the light curves in RAM, this makes data loaders faster but uses a lot of memory
- Lazy loading: The dataset search the light curves in disk. Data loader will be slower but less memory consuming
Check the dataset_eager_vs_lazy
notebook for a comparison between these two
TODO: Evaluate dask