Add code for RAFT #47

umbertov · 2020-12-23T14:37:10Z

Hi, I recently read a paper ( Run Away From your Teacher: a New Self-Supervised Approach Solving the Puzzle of BYOL ) expanding the work on self-supervised algorithms and providing new perspective on the BYOL methodology. By reading the paper, i thought it would have been quite easy to implement their proposal in a usable version, using this repository as a base.

If anyone seeing this PR have read the paper in question, or want to take a look at it, you can have a look at the alignment_loss and cross_model_loss function, as well as the forward method in the RAFT class, and verify that the implementation agrees with the description on the paper. Everything looks in order to me, but some confirmation never hurts.

The only slight concern i have about this PR is that it may be beyond the scope of this repository, which would resemble a collection of BYOL-related paper implementations. @lucidrains the decision is up to you, but since 99% of the code is shared between this RAFT implementation and the BYOL one, i decided to make a PR instead of just copy-pasting all your code to a brand new project.

Another notable implementation detail is that i changed the loss_fn's name to byol_loss in order to avoid confusion and maintain a consistent naming with raft_loss.

What follows is the commit message, GitHub pasted this here automatically so i might as well leave this here:

Add implementation for the algorithm described by the paper
"Run Away From your Teacher: Understanding BYOL by a Novel
Self-Supervised Approach" ( https://arxiv.org/abs/2011.10944 )

The RAFT class is essentially a copy-paste of the BYOL class,
with slight changes to the forward method, which computes a
different loss functionm making use of the new raft_loss
function. RAFT's loss is the difference of two losses: an
"alignment loss" between the projection of two different
augmented views of the same image, to be minimized, and the
"cross-model loss", to be maximized, which is the distance
between the online and target representation of the same input,
averaged over the two different views.

Tell me what you think :)

Add implementation for the algorithm described by the paper "Run Away From your Teacher: Understanding BYOL by a Novel Self-Supervised Approach" ( https://arxiv.org/abs/2011.10944 ) The RAFT class is essentially a copy-paste of the BYOL class, with slight changes to the `forward` method, which computes a different loss functionm making use of the new `raft_loss` function. RAFT's loss is the difference of two losses: an "alignment loss" between the projection of two different augmented views of the same image, to be minimized, and the "cross-model loss", to be maximized, which is the distance between the online and target representation of the same input, averaged over the two different views.

umbertov · 2020-12-23T15:43:05Z

A weird thing i noticed is that since we're supposed to maximize the alignment loss, and minimize the cross-model loss, is that the loss is quite prone to being negative, since both losses are a positive quantity, and their difference is negative when one is greater than the other (the difference in question is thing_to_minimize - thing_to_maximize).

Does anyone more experienced than me have some trick to deal with this? Should i consider this a problem at all? Intuitively, i think it's a problem, because the optimizer tries to minimize the loss, and not move it towards zero as we'd like it to, so the loss would just approach negative infinity as the training goes forward.

Since the paper's authors define the RAFT loss as being actually a linear combination of the two losses, i just scaled the cross-model loss by 0.5 in my code and this suffices to get a positive loss, but i think hard-coding a magic number isn't going to be a universal solution.

Another trick could be to just take the absolute value, or an even power of the loss, but i don't know if it is mathematically sound

chingisooinar · 2022-02-15T08:48:51Z

@umbertov hello, I think there's no need to create separate methods for alignment and cross-model loss calculation. As far as I understand, the original loss_fn can be used for both. Anyways, thank you for sharing and correct me if I'm wrong.

umbertov force-pushed the master branch from 8d1452e to 8f884d7 Compare December 23, 2020 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code for RAFT #47

Add code for RAFT #47

umbertov commented Dec 23, 2020

umbertov commented Dec 23, 2020 •

edited

Loading

chingisooinar commented Feb 15, 2022 •

edited

Loading

Add code for RAFT #47

Are you sure you want to change the base?

Add code for RAFT #47

Conversation

umbertov commented Dec 23, 2020

umbertov commented Dec 23, 2020 • edited Loading

chingisooinar commented Feb 15, 2022 • edited Loading

umbertov commented Dec 23, 2020 •

edited

Loading

chingisooinar commented Feb 15, 2022 •

edited

Loading