Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check handling of Dataloader in eval task #113

Open
kvantricht opened this issue Oct 7, 2024 · 2 comments
Open

Check handling of Dataloader in eval task #113

kvantricht opened this issue Oct 7, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@kvantricht
Copy link
Contributor

We have to be really careful when running the eval task like this:

https://github.com/WorldCereal/presto-worldcereal/blob/ce3fae1bb1054ba0f8c60edb7b0a0edc76dbf3b2/presto/eval.py#L252:L277

We normally iterate manually through the dataset, like this:

for i in range(len(ds)):
     ... = ds[i]

which works also when we have duplicated indices for example for balancing. However, when just iterating through a Dataloader object, it does not seem like we are iterating through all indices.

@kvantricht kvantricht added the bug Something isn't working label Oct 7, 2024
@kvantricht
Copy link
Contributor Author

kvantricht commented Oct 7, 2024

In fact, it seems such a case could be easily tackled by adding our own __iter__ method in WorldCerealBase dataset, so initializing a normal Dataloader will just work as expected. What do you think @gabrieltseng @cbutsko ?

def __iter__(self):
        for idx in self.indices:
            yield self.__getitem__(idx)

Unless the whole idea is not to use duplicated indices when just using a Dataloader? However, I want to be able to duplicate indices to allow augmentation and still make use of large batch sizes and multiprocessing using Dataloader.

@cbutsko
Copy link

cbutsko commented Nov 22, 2024

@kvantricht to check this again.
Should we add it to tests as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants