Is CEBRA a tool adapted to use spike trains data matrices representing short isolated trials as input? #191
Replies: 2 comments
-
Hi @JorgeVindelAlfageme , thanks for using CEBRA. Since you are using multi-session training, you are specifying a label to align the embeddings to, which is a difference to e.g. PCA and GPFA. In (1), which options did you use for training the CEBRA embedding, could you share a code snippet? For (2), it might be interesting to also try CEBRA-time embeddings, and compare how these look. If the individual recordings are also just 3s (so I am assuming sth like 300 samples, or more?) it will also make a difference if you feed these matrices as separate sessions to the multi-session solver, or learn one embedding over a continuous time course (if that is possible) |
Beta Was this translation helpful? Give feedback.
-
Hi, Steffen. Thanks for your answer.
The rest of the hyperparameters remains default. Note: With respect to establishing "conditional = None", since I'm using CEBRA multi-session, I need to input a list containing my auxiliary variables as vectors. I'm using a vector representing time for each bin as an auxiliary variable, and this vector is repeated as many times as there are trials data matrices. That is, it is a constant vector representing time, and after indicating "conditional" as "None", I understand that I can turn time into the sole auxiliary variable that I want to use while executing CEBRA multi-session. Maybe I am wrong, so it is good that I have told you this. As I said, I've tried different hyperparameter combinations out of the possibilities written above. Nevertheless, I obtain the same kind of representations, which is a constant trajectory, and all trajectories end overlapping each other.
Now that we've talked about this, I need to ask you: I) Do you think it would be a good approach to train a single model using CEBRA multisession-mode and every trial from all individuals as input, just so the number of trials increases up to 300 or so, so this model can easily output a representation for each one of those, showing their intrinsic data variability? Is there anything wrong with setting "conditional = None" and using time vectors as the unique auxiliary variables while executing CEBRA multi-session mode? II) If the approach showed in I) seems adequate, how would you train the model? Would you chainedly use "fit" method, or "partial_fit" method instead? What's the difference between each other? Using a chained training approach seems like the only possibility to avoid GPU overloading if dataset size is relatively big, right? III) Do you think that it is better to concatenate all matrices as a single data matrix, to use that as input for CEBRA-time? How would you use CEBRA-time, as you mentioned in 2), with this dataset if CEBRA multi-session is not being used? IV) Is CEBRA sometimes dependent on any data standardization previous to its execution, such as a Gaussian filter as it is used before executing PCA? |
Beta Was this translation helpful? Give feedback.
-
Hello:
I'm trying to use CEBRA on multiple datasets, each one coming from a different individual (mouse). Each dataset has been recorded from prefrontal cortex, and have the structure of binned spike counts in multiple matrices. Each mouse has showed the same behaviour multiple times over the entire experimental session, so for each mouse, we have recorded as many matrices of binned spike counts as times this behaviour was identified. But it is suposed that the embeddings will be modified due to changes over the experimental session. The duration of each behaviour represented as a single matrix is the same over all matrices, i. e., 3 seconds in total, which is a short period of time.
I have already used two different Dimensional Reduction Techniques (DRT), which are PCA and GPFA, to check the resulting embeddings, and both of them coincide in their results, showing variations between trials but following a shared pattern, so it looks like this data analysis and processing is correct.
I want to use CEBRA so I generate a model for each mouse. Each CEBRA model would be generated from a set of matrices, representing the binned spike counts over time. In this case, I think that CEBRA multi-session mode is a good option to be used. A different "session_id" when transforming inputs into embeddings can be used if CEBRA muti-session mode is chosen (as you previously indicated on: #49).
In a previous discussion about CEBRA usage (I'm talking about the same discussion as before: #49) you talked about how to handle epoched data that is not continuous over time. In that discussion, you also mentioned that there was a CEBRA document about producing results with this kind of data, even though these trials were chained over time (https://cebra.ai/docs/demo_notebooks/Demo_primate_reaching_mse_loss.html).
From this document, what we can learn is that a specific CEBRA model architecture can be used to obtain adequate embeddings, and other hyperparameters can be influencing the kind of result that is being obtained.
Taking everything into account, my questions are:
Is CEBRA the correct tool to manipulate this kind of datasets to obtain embeddings from it? I used another two DRT, and their results from my datasets coincide, but I don't know how to use CEBRA in this case. Everytime I use it, regardless the hyperparameters that I have been chosen, I only get a set of embeddings which is practically the same over trials, even when this is not supposed to happen, experimentally and computationally based on the evidence.
If CEBRA can be used in this scenario, which is the correct way of using it? What are the hyperparameters that are needed to be changed?
Thank you kindly.
Beta Was this translation helpful? Give feedback.
All reactions