Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code for RAFT #47

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add code for RAFT #47

wants to merge 1 commit into from

Conversation

umbertov
Copy link
Contributor

Hi, I recently read a paper ( Run Away From your Teacher: a New Self-Supervised Approach Solving the Puzzle of BYOL ) expanding the work on self-supervised algorithms and providing new perspective on the BYOL methodology. By reading the paper, i thought it would have been quite easy to implement their proposal in a usable version, using this repository as a base.

If anyone seeing this PR have read the paper in question, or want to take a look at it, you can have a look at the alignment_loss and cross_model_loss function, as well as the forward method in the RAFT class, and verify that the implementation agrees with the description on the paper. Everything looks in order to me, but some confirmation never hurts.

The only slight concern i have about this PR is that it may be beyond the scope of this repository, which would resemble a collection of BYOL-related paper implementations. @lucidrains the decision is up to you, but since 99% of the code is shared between this RAFT implementation and the BYOL one, i decided to make a PR instead of just copy-pasting all your code to a brand new project.

Another notable implementation detail is that i changed the loss_fn's name to byol_loss in order to avoid confusion and maintain a consistent naming with raft_loss.

What follows is the commit message, GitHub pasted this here automatically so i might as well leave this here:

Add implementation for the algorithm described by the paper
"Run Away From your Teacher: Understanding BYOL by a Novel
Self-Supervised Approach" ( https://arxiv.org/abs/2011.10944 )

The RAFT class is essentially a copy-paste of the BYOL class,
with slight changes to the forward method, which computes a
different loss functionm making use of the new raft_loss
function. RAFT's loss is the difference of two losses: an
"alignment loss" between the projection of two different
augmented views of the same image, to be minimized, and the
"cross-model loss", to be maximized, which is the distance
between the online and target representation of the same input,
averaged over the two different views.

Tell me what you think :)

Add implementation for the algorithm described by the paper "Run Away
From your Teacher: Understanding BYOL by a Novel Self-Supervised
Approach" ( https://arxiv.org/abs/2011.10944 )

The RAFT class is essentially a copy-paste of the BYOL class, with
slight changes to the `forward` method, which computes a different loss
functionm making use of the new `raft_loss` function. RAFT's loss is the
difference of two losses: an "alignment loss" between the projection of
two different augmented views of the same image, to be minimized, and
the "cross-model loss", to be maximized, which is the distance between
the online and target representation of the same input, averaged over
the two different views.
@umbertov
Copy link
Contributor Author

umbertov commented Dec 23, 2020

A weird thing i noticed is that since we're supposed to maximize the alignment loss, and minimize the cross-model loss, is that the loss is quite prone to being negative, since both losses are a positive quantity, and their difference is negative when one is greater than the other (the difference in question is thing_to_minimize - thing_to_maximize).

Does anyone more experienced than me have some trick to deal with this? Should i consider this a problem at all? Intuitively, i think it's a problem, because the optimizer tries to minimize the loss, and not move it towards zero as we'd like it to, so the loss would just approach negative infinity as the training goes forward.

Since the paper's authors define the RAFT loss as being actually a linear combination of the two losses, i just scaled the cross-model loss by 0.5 in my code and this suffices to get a positive loss, but i think hard-coding a magic number isn't going to be a universal solution.

Another trick could be to just take the absolute value, or an even power of the loss, but i don't know if it is mathematically sound

@chingisooinar
Copy link

chingisooinar commented Feb 15, 2022

@umbertov hello, I think there's no need to create separate methods for alignment and cross-model loss calculation. As far as I understand, the original loss_fn can be used for both. Anyways, thank you for sharing and correct me if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants