Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'synth_set' used twice #92

Open
fschmid56 opened this issue Apr 28, 2024 · 4 comments
Open

'synth_set' used twice #92

fschmid56 opened this issue Apr 28, 2024 · 4 comments

Comments

@fschmid56
Copy link

fschmid56 commented Apr 28, 2024

Hi, I was looking through the code for the DCASE'24 Task 4 baseline system and noticed the following lines in the file train_pretrained.py:

strong_full_set = torch.utils.data.ConcatDataset([strong_set, synth_set])
tot_train_data = [maestro_real_train, synth_set, strong_full_set, weak_set, unlabeled_set]
train_dataset = torch.utils.data.ConcatDataset(tot_train_data)

According to this, 'synth_set' is used twice. Is there a specific reason for this?

@popcornell
Copy link
Collaborator

Hi,

Thanks for the question,
I think it has been done only to "upsample" the amount of synthetic training data during each epoch.
It is very similar to having 12 for synthetic training data as in the past recipe but it has been split into 6 and 6+strong.

In general the recipe is very sensitive to the batch size and the proportions of each dataset.
This is for sure not optimal but worked well in our experiments.

@JanekEbb do you know more maybe ?

@fschmid56
Copy link
Author

Thanks for the explanation!

@JanekEbb
Copy link
Collaborator

Actually, I'd say that leads to strong_set (strong Audioset portion) being underrepresented in the training. Currently strong_set makes only 6/64*3470/(10000+3470)≈2.6% of the training data if I am not wrong. We may wanna fix that.

Thanks for pointing that out Florian!

@popcornell
Copy link
Collaborator

After many tries it seems to me that the best configuration is this one with the strong and synth concatenated.
The strong labels do not seem to help in my case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants