Separate grid search for contrastive vs decoding loss? #142

asafbenj · 2024-04-10T12:44:52Z

asafbenj
Apr 10, 2024

Hi,
Does it make sense to optimize hyper-params for the contrastive loss separately from decoding? I.e. is the embedding of the model with the lowest contrastive loss expected to be the best input for decoding, or should we optimize them together, even if it's highly inefficient (e.g. lots of unlabeled data and a bit of labeled data)?
Thanks!

MMathisLab · 2024-04-10T14:59:48Z

MMathisLab
Apr 10, 2024
Maintainer

We optimize them separately; I would not go on to decoding until one has a consistent encoding model (i..e running the same parameters many times gives you the same latents).

2 replies

asafbenj Apr 11, 2024
Author

Thanks for the fast response!
Just to make sure - you mean running the same params and same data, just different seed many times, or same params and different subsets of the data? And re' "same latents" - I'll re-examine the details of the paper and demos, but generally speaking, you mean qualitatively similar structure, or a hard threshold for some measure of similarity?
Thanks again!

MMathisLab Apr 11, 2024
Maintainer

You should use the consistency metric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate grid search for contrastive vs decoding loss? #142

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Separate grid search for contrastive vs decoding loss? #142

asafbenj Apr 10, 2024

Replies: 1 comment · 2 replies

MMathisLab Apr 10, 2024 Maintainer

asafbenj Apr 11, 2024 Author

MMathisLab Apr 11, 2024 Maintainer

asafbenj
Apr 10, 2024

Replies: 1 comment 2 replies

MMathisLab
Apr 10, 2024
Maintainer

asafbenj Apr 11, 2024
Author

MMathisLab Apr 11, 2024
Maintainer