CEBRA Interpretability #143

xanderladd · 2024-04-13T17:56:24Z

xanderladd
Apr 13, 2024

Thanks for this tool and for the support you give on github!

I am able to produce embeddings of neural data that have a clear interpretation in terms of behavior. But I am wondering if there is a way to do something like PCA loadings in order to understand the contribution of an individual neuron to the embeddings. I know this is a tall order because the weights of the neural network are not intuitively interpretable. Is this worth thinking about? Or should I just consider the positive / negative sample neural nets to be black boxes?

Some ideas I had for this were:

shap a game theoretic approach to used to identify the most informative relationships between the input features and the predicted outcome. The interpretation of these results is fairly limited and in my understanding, they are less meaningful for more complicated models.
ablate neurons another idea is to simply remove neurons from the training/testing process and see how the embeddings change.

Ultimately, what would be great to know is some way of quantifying a single neuron (or a subset of neurons) contribution to the embeddings. I don't expect to make strong causal arguments this way, but correlational evidence for further hypothesis discovery would be good.

stes · 2024-04-13T18:07:11Z

stes
Apr 13, 2024
Maintainer

Hi @xanderladd , great question! Please have a look at this earlier discussion on this topic, we are actively working on it. Here is our latest work on the topic which we will integrate into CEBRA: https://sslneurips23.github.io/paper_pdfs/paper_80.pdf

3 replies

xanderladd Apr 13, 2024
Author

thank you, this is exactly what I was looking for.

xanderladd Apr 28, 2024
Author

Hi again @stes -- I am now implementing this and see that $A = J^+_f(x) \odot L(x)$. I understand $J^+_f(x)$ to be the psuedoinverse of the jacobian. The paper mentions "jacobian of feature encoder" -- so should the jacobian be partial derivative w.r.t input features and output features (eg; I have 76 neurons and output dimension is 8 so jacobian shape 76x8 OR is J(x) supposed to have shape 6 x 3500 timesteps). I understand L(x) related to the loss function, but I do not understand how this becomes a lower triangular matrix?

My guess is that $\phi(x,x')$ is the infoNCE objective which represents pairwise loss for all pairs $x,x' \in X$. If I apply the CEBRA infoNCE loss and somehow get a symmetric matrix, can I just take the lower triangular and call that L(x)?

Ultimately, it would be a huge help to know the shapes of $ J^+_f(x) $ and $L(x)$

I will need to appreciate the theory here with a more thorough read, but right now I am simply looking to implement the attribution map in a straightforward manner.

thank you for your time/support

MMathisLab Apr 29, 2024
Maintainer

We already have it implemented internally, and it's not completely straightforward as you need the RegCL part; we are working to release the code in the near future, so it might be ideal to wait for that. 🤗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CEBRA Interpretability #143

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

CEBRA Interpretability #143

xanderladd Apr 13, 2024

Replies: 1 comment · 3 replies

stes Apr 13, 2024 Maintainer

xanderladd Apr 13, 2024 Author

xanderladd Apr 28, 2024 Author

MMathisLab Apr 29, 2024 Maintainer

xanderladd
Apr 13, 2024

Replies: 1 comment 3 replies

stes
Apr 13, 2024
Maintainer

xanderladd Apr 13, 2024
Author

xanderladd Apr 28, 2024
Author

MMathisLab Apr 29, 2024
Maintainer