Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added implementation for MAE [work in progress - help appreciated] #19

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ariguiba
Copy link

@ariguiba ariguiba commented Dec 9, 2024

Hi 👋 I'm new here and trying to learn more about SSL techniques. This dataset seemed like a great place to start!

First try at implementing a masked autoencoder
The idea was to extend the approach of simCLR and implement a masked autoencoder using self-supervised learning. For now the masked mechanism is turned off and I tried to implement a simple reconstruction autoencoder.

I followed similar architecture design as simCLR, using only CNNs and Linear layers, and similar hyperparams.
The main setup is as follows:

  • mask_ratio: 0%
  • Encoder architecture (Decoder: symmetric)
    • CNN: num_channels: 25 + kernel: 5 + stride:2
    • CNN: num_channels: 25 + kernel: 3 + stride: 2
    • Linear projection head: 12 neurons (optional)
  • lr = 0.01
  • num epochs: 100
  • batch size = 100
  • Optimizer: Adam
  • Loss function: MSE
  • Seed: 42
  • Training on both test + train data
  • tSNE: on all data - in latent space
  • kNN & linReg fit to train data + run on test data - in latent space

⚠️ Current issues:

  • Linear Regression accuracy seems to increase above baseline, but doesn't improve further with training
  • kNN accuracy decreases significantly from baseline
  • No visible cluster behavior (see tSNE projection) in the latent space

My intuition is: the reason behind this is that autoencoders are not optimized for similarity but for reconstruction, so that's why we're not seeing any class clusters. However, I would still expect some clusters (say of features or something) in the latent space. This is also necessary once it's extended to a masked autoencoder.

Any insights? Opinions on this?
Has anyone tried it before?

Any feedback is appreciated 🎉

@dkobak
Copy link
Collaborator

dkobak commented Dec 9, 2024

I haven't tried it.

I haven't looked at your code in detail, but it seems your network is not really training: the loss per batch is constant over epochs, and the reconstruction quality is very poor...

@ariguiba
Copy link
Author

ariguiba commented Dec 9, 2024

I haven't tried it.

I haven't looked at your code in detail, but it seems your network is not really training: the loss per batch is constant over epochs, and the reconstruction quality is very poor...

Yes, that's also my worry that the model is not learning at all and I don't understand why, I've tried smaller and bigger model (each time adding 1-2 conv layers or playing around with the projector) but that didn't have any effect.

Thank you in advance if you check it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants