You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We normally iterate manually through the dataset, like this:
for i in range(len(ds)):
... = ds[i]
which works also when we have duplicated indices for example for balancing. However, when just iterating through a Dataloader object, it does not seem like we are iterating through all indices.
The text was updated successfully, but these errors were encountered:
In fact, it seems such a case could be easily tackled by adding our own __iter__ method in WorldCerealBase dataset, so initializing a normal Dataloader will just work as expected. What do you think @gabrieltseng@cbutsko ?
def __iter__(self):
for idx in self.indices:
yield self.__getitem__(idx)
Unless the whole idea is not to use duplicated indices when just using a Dataloader? However, I want to be able to duplicate indices to allow augmentation and still make use of large batch sizes and multiprocessing using Dataloader.
We have to be really careful when running the eval task like this:
https://github.com/WorldCereal/presto-worldcereal/blob/ce3fae1bb1054ba0f8c60edb7b0a0edc76dbf3b2/presto/eval.py#L252:L277
We normally iterate manually through the dataset, like this:
which works also when we have duplicated indices for example for balancing. However, when just iterating through a Dataloader object, it does not seem like we are iterating through all indices.
The text was updated successfully, but these errors were encountered: