Replies: 1 comment 2 replies
-
We optimize them separately; I would not go on to decoding until one has a consistent encoding model (i..e running the same parameters many times gives you the same latents). |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Does it make sense to optimize hyper-params for the contrastive loss separately from decoding? I.e. is the embedding of the model with the lowest contrastive loss expected to be the best input for decoding, or should we optimize them together, even if it's highly inefficient (e.g. lots of unlabeled data and a bit of labeled data)?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions