Contrastive loss converges but decoding worse than chance #134

asafbenj · 2024-03-26T14:34:19Z

asafbenj
Mar 26, 2024

Hi,

Trying CEBRA for the first time, basically trying to optimize decoding of a continuous variable. Tried a random search of the following hyperparameters and actually consistently getting negative decoding R^2 (e.g. -0.3, i.e. MSE is significantly higher than the label variance).

- model_architecture
- min_temperature 
- learning_rate 
- max_iterations
- output_dimension 
- num_hidden_units

Also tried to include temperature as a hyper-param (with temperature_mode=constant) but that somehow shortens the embedding from 37,200 samples to 9,300 so that the decoder training led to the following error:
ValueError: Invalid shape: y and X must have the same number of samples, got y:37200 and X:9300.

Note: baseline models (e.g. XGBoost on single time-points with no temporal context) perform reasonably well on this task (R^2 typically
between 0.05-0.25)

Any ideas???
Thanks!

MMathisLab · 2024-03-26T14:49:25Z

MMathisLab
Mar 26, 2024
Maintainer

The loss here looks actually rather strange; can you give us some info about the data size and batch size? And the model type and dimensions need to be carefully considered based on the data you have, so depending on how you sweep this, you could very much overfit ...

Imagine data size of 10, and dim >9 would not work; or a model with receptive field of >10 with length of data <10... would not work, etc.

24 hidden units also seems quite small, but again depends on input size

Can you just us some more details, and also a visualization of the embedding would be helpful as well.

I would set temp=1 to start.

0 replies

asafbenj · 2024-03-26T15:05:20Z

asafbenj
Mar 26, 2024
Author

Thanks so much for the super fast response!
The input data is 36-dim with ~5e4 time-points (down-sampled to 10Hz), the continuous label is 1D for now, batch size is 2^12.
My embeddings are either lines or unstructured clouds/spheres, depending on the parameters, so I guess my title doesn't really describe the situation very well.
(These input data are actually behavioral features - trying to "hijack" CEBRA to learn an embedding of my (social) behavioral data and then use that to predict the concurrently recorded neural activity - I guess that might require some careful handling that I'm currently neglecting)

0 replies

MMathisLab · 2024-03-26T15:08:23Z

MMathisLab
Mar 26, 2024
Maintainer

no problem :) happy to help!

okay so "either lines or unstructured clouds/spheres" means that it's collapsing for sure; if you don't use a 1D label to start, but rather unsupervised (CEBRA-Time), can you get embeddings with more structure? :D

0 replies

asafbenj · 2024-03-26T15:27:55Z

asafbenj
Mar 26, 2024
Author

print(model)
CEBRA(batch_size=4096, learning_rate=0.008310874621545826,
      min_temperature=0.7083419945853943, num_hidden_units=30,
      output_dimension=31, temperature_mode='auto')

Note: I'm not touching the conditional parameter, just using model.fit(X) instead of model.fit(X,y)
Training set embedding:

Validation set embedding:

But the decoding performance based on the unsupervised embedding still sucks (~-0.3), so maybe the hybrid mode would be a better choice for me?

5 replies

asafbenj Mar 26, 2024
Author

changing conditional to time_delta and hybrid to True somehow shortens the embedding from 37,200 samples to 9,300 so that the decoder training raises:

ValueError: Invalid shape: y and X must have the same number of samples, got y:37200 and X:9300.

print(model)
CEBRA(batch_size=4096, conditional='time_delta', hybrid=True,
      learning_rate=0.0012053588540225897, min_temperature=0.9735748227499281,
      model_architecture='offset40-model-4x-subsample', num_hidden_units=24,
      output_dimension=35, temperature_mode='auto')

stes Mar 27, 2024
Maintainer

@asafbenj , the shortening is because you use the offset40-model-4x-subsample model. Is it intended to subsample the embedding (by factor 4x)? otherwise, I would recommend a different model choice.

stes Mar 27, 2024
Maintainer

could you color the embedding plots above according to the variable you would like to decode to get a visual impression? Also, how do you perform the decoding?

For the temperature, I would recommend to pick the constant mode, as the temperature drops to the selected minimum directly in the plot you shared above.

asafbenj Mar 28, 2024
Author

OK, I've removed model_architecture from my hyper-param optimization, changed to constant temperature mode, and added temperature to my HPO w values in [.1, 1].

Here's how I try to decode:

embeder = CEBRA(
    batch_size = 2**12,
    max_iterations = int(1e4)
    verbose = False,
    **params)
embeder.fit(X, y)
train_embedding = embeder.transform(X)
valid_embedding = embeder.transform(X_val)
decoder = cebra.KNNDecoder(K) # K is now sampled between 1 and 64 (log-scale)
decoder.fit(train_embedding, y)
preds = decoder.predict(valid_embedding)

And here's the (training) embedding colored by y.

Here's the valid set:

I guess if you squint it seems kinda correlated but could probably be significantly improved right?

asafbenj Apr 10, 2024
Author

Hi, sorry to nudge, but wanted to check if anyone had any ideas about this? I highly appreciate all the help - thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contrastive loss converges but decoding worse than chance #134

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Contrastive loss converges but decoding worse than chance #134

asafbenj Mar 26, 2024

Replies: 4 comments · 5 replies

MMathisLab Mar 26, 2024 Maintainer

asafbenj Mar 26, 2024 Author

MMathisLab Mar 26, 2024 Maintainer

asafbenj Mar 26, 2024 Author

asafbenj Mar 26, 2024 Author

stes Mar 27, 2024 Maintainer

stes Mar 27, 2024 Maintainer

asafbenj Mar 28, 2024 Author

asafbenj Apr 10, 2024 Author

asafbenj
Mar 26, 2024

Replies: 4 comments 5 replies

MMathisLab
Mar 26, 2024
Maintainer

asafbenj
Mar 26, 2024
Author

MMathisLab
Mar 26, 2024
Maintainer

asafbenj
Mar 26, 2024
Author

asafbenj Mar 26, 2024
Author

stes Mar 27, 2024
Maintainer

stes Mar 27, 2024
Maintainer

asafbenj Mar 28, 2024
Author

asafbenj Apr 10, 2024
Author