multi-session, no label, input dimensions (kinda?) consistent #141

asafbenj · 2024-04-10T12:28:37Z

asafbenj
Apr 10, 2024

Hi,

I have a multi session data set - multiple mice are recorded, each on multiple days. I'm trying to learn an embedding of the behavior across multiple sessions, but CEBRA seems to require a label for multi-session training. Since the behavioral features are always the same, I figured it might make sense to concatenate the sessions into one big session, although the behavior of the different mice/days is considerably different. This runs smoothly (actually suspiciously fast compared to single-session run), but does it seem sensible? Is there a better work-around? Should I add any additional features with some meta-data about the sessions? Any other issues I should consider?

The data comes out to ~7M time-points, with 36 features.

Here are the plots for batch_size = 2**12, output_dimension = 35, num_hidden_units = 64, temperature_mode="auto":

and with temperature set to 1:

taking every 100th row of the embedding:

Also, in a previous discussion you said the loss in the auto-temp-mode seems weird, and setting it to 1 would be better. However, if I run a grid search on this, the auto mode would win by far, right? Should I just leave the temperature out of the grid search?

Thanks!

stes · 2024-05-01T14:49:45Z

stes
May 1, 2024
Maintainer

Hi @asafbenj , thanks for your question. Let's me reply inline:

I have a multi session data set - multiple mice are recorded, each on multiple days. I'm trying to learn an embedding of the behavior across multiple sessions, but CEBRA seems to require a label for multi-session training.

This is used for "aligning" the data across sessions with respect to a label. It seems like you do not want to use this feature, though?

Since the behavioral features are always the same, I figured it might make sense to concatenate the sessions into one big session, although the behavior of the different mice/days is considerably different.

This is possible and can make sense depending on your research question. The only issue might be that there are sharp breaks in the data due to concatenation. If each session is quite long, this might be negligible, though.

This runs smoothly (actually suspiciously fast compared to single-session run), but does it seem sensible?

Yes, can be sensible. At your data size, since this uses CEBRA-Time under the hood vs. CEBRA-Behavior on a quite large dataset, it is expected that this runs much faster.

Is there a better work-around? Should I add any additional features with some meta-data about the sessions? Any other issues I should consider?

Happy to discuss more. Could you maybe expand a bit more on the details of your analysis, analysis goals, etc.? If you are interested in analysing differences over time, while obtaining a shared feature space, this approach is perfectly reasonable.

I would aim to label the embedding spaces somehow, though --- that this number of points, it is expected that a lot of the space will be covered, and it would make sense to think about ways to visualize the data in an insightful way (or run decoding, etc.)

Any additional comments?

0 replies

stes · 2024-05-01T14:50:11Z

stes
May 1, 2024
Maintainer

Regarding the temperature, you might want to consider setting it to even lower values (without the auto-temp mode)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-session, no label, input dimensions (kinda?) consistent #141

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

multi-session, no label, input dimensions (kinda?) consistent #141

asafbenj Apr 10, 2024

Replies: 2 comments

stes May 1, 2024 Maintainer

stes May 1, 2024 Maintainer

asafbenj
Apr 10, 2024

stes
May 1, 2024
Maintainer

stes
May 1, 2024
Maintainer