Check handling of `Dataloader` in eval task #113

kvantricht · 2024-10-07T18:10:18Z

We have to be really careful when running the eval task like this:

https://github.com/WorldCereal/presto-worldcereal/blob/ce3fae1bb1054ba0f8c60edb7b0a0edc76dbf3b2/presto/eval.py#L252:L277

We normally iterate manually through the dataset, like this:

for i in range(len(ds)):
     ... = ds[i]

which works also when we have duplicated indices for example for balancing. However, when just iterating through a Dataloader object, it does not seem like we are iterating through all indices.

The text was updated successfully, but these errors were encountered:

kvantricht · 2024-10-07T18:14:37Z

In fact, it seems such a case could be easily tackled by adding our own __iter__ method in WorldCerealBase dataset, so initializing a normal Dataloader will just work as expected. What do you think @gabrieltseng @cbutsko ?

def __iter__(self):
        for idx in self.indices:
            yield self.__getitem__(idx)

Unless the whole idea is not to use duplicated indices when just using a Dataloader? However, I want to be able to duplicate indices to allow augmentation and still make use of large batch sizes and multiprocessing using Dataloader.

cbutsko · 2024-11-22T09:57:53Z

@kvantricht to check this again.
Should we add it to tests as well?

kvantricht added the bug Something isn't working label Oct 7, 2024

kvantricht assigned cbutsko and gabrieltseng Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check handling of `Dataloader` in eval task #113

Check handling of `Dataloader` in eval task #113

kvantricht commented Oct 7, 2024

kvantricht commented Oct 7, 2024 •

edited

Loading

cbutsko commented Nov 22, 2024

Check handling of Dataloader in eval task #113

Check handling of Dataloader in eval task #113

Comments

kvantricht commented Oct 7, 2024

kvantricht commented Oct 7, 2024 • edited Loading

cbutsko commented Nov 22, 2024

Check handling of `Dataloader` in eval task #113

Check handling of `Dataloader` in eval task #113

kvantricht commented Oct 7, 2024 •

edited

Loading