Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate bolts + torch hub #442

Open
edenlightning opened this issue Dec 10, 2020 · 19 comments
Open

Integrate bolts + torch hub #442

edenlightning opened this issue Dec 10, 2020 · 19 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@edenlightning
Copy link
Contributor

No description provided.

@Borda
Copy link
Member

Borda commented Dec 23, 2020

well, we can set the Bolts as models to register, but still, for getting weights we need some heavy GPU machines...

@Borda Borda added the enhancement New feature or request label Dec 23, 2020
@oke-aditya
Copy link
Contributor

oke-aditya commented Jan 7, 2021

Can we do vice-versa too ?

  1. Load a model from torch.hub.
  2. train / finetune with PyTorch Lightning.

I would be highly interested in implementing such feature.

@stale stale bot added the won't fix This will not be worked on label Mar 8, 2021
@Borda Borda added this to the v0.4 milestone Mar 8, 2021
@Lightning-Universe Lightning-Universe deleted a comment from stale bot Mar 8, 2021
@Borda Borda removed the won't fix This will not be worked on label Mar 8, 2021
@Programmer-RD-AI
Copy link
Contributor

hi,
I would like to help with this issue.

With best regards,
Ranuga

@Borda
Copy link
Member

Borda commented Oct 24, 2021

hi, I would like to help with this issue.

Great! Let's sync up also with the Bolts refactoring =)

@oke-aditya
Copy link
Contributor

oke-aditya commented Oct 24, 2021

Just for information, currently there is a refactor in torchvision.models going on available in prototype folder.

So the API with the hub might change.

Edit:
Also a small note, torchvision detection models do not work with Hub.

Let me know if I can help.

P.S. A book on PyTorch Lightning will be out end of this year!

@Programmer-RD-AI
Copy link
Contributor

I will start working on this.

:)

With best regards,
Ranuga

@Programmer-RD-AI
Copy link
Contributor

Hi,
I want to know what the issue is to use a torch model in PyTorch Lightning and Fine Tune?

With best regards,
Ranuga

@oke-aditya
Copy link
Contributor

Torch hub allows you to load the model, but you need to do model surgery for specifying number of classes, etc.

I have an example for DeTR.

https://github.com/oke-aditya/quickvision/blob/master/quickvision/models/detection/detr/model_factory.py

We can load the detr backbone, but need to adjust the head classifier for own number of classes.

Similarly for CNNs, One need to load the backbone, modify the head classifier for custom num_classes. You need to freeze / unfreeze layers while transfer learning and fine-tuning.

We can think about this little bit more, this is something Flash does well I think.

cc @Borda @kaushikb11 @Programmer-RD-AI @akihironitta

@Programmer-RD-AI
Copy link
Contributor

ok, thank you @oke-aditya
I will try to fix the issue.

@oke-aditya
Copy link
Contributor

Since a single PR will not be a solution.
I would suggest to propose a brief prototype (probably a branch here or new repo) and let maintainers have a look.
Also would suggest to check over slack / with Borda if this is part of PL plans moving ahead with bolts.

@Programmer-RD-AI
Copy link
Contributor

ok thank you @oke-aditya

@Programmer-RD-AI
Copy link
Contributor

Programmer-RD-AI commented Oct 26, 2021

hi,
I am currently building a demo of this and my question is I can.

from torchvision.models import googlenet

model = googlenet().to(device)
print(model) # Prints the model architecture
model.fc = Linear(1000,len(classes))

then use the model as usual.
I am just a bit confused that's why.
Thank you.

@oke-aditya
Copy link
Contributor

Yes, you can and this is correct way, But note that fc layer is applicable for GoogleNet and Resnet, for models like mobilenet it is called classiifer or something else (please check). For CNNs it is simple to just modify the last layer to support more number of classes.

@Programmer-RD-AI
Copy link
Contributor

Programmer-RD-AI commented Oct 26, 2021

hi,
I usually add
__init__
self.output = Linear(1000,len(classes))
forward

preds = self.tl_model(X)
preds = self.output(preds)

I don't know if this is the best way but when I am testing TL Models I use this

@oke-aditya
Copy link
Contributor

Hi !
I think You are adding an additional Linear layer on top of fully connected layer. This is not the best way to transfer learning, it would work fine in practice as you get an extra fully connected layer. Which means a addition of 1000 * 1000 parameters, (as your previous fc layer Linear(x, 1000) and you have (1000, num_classes) now

Best way is to edit the existing layer and replace it with Linear(1000, num_classes). This does not increase the number of parameters drastically.

Thanks for asking

@Programmer-RD-AI
Copy link
Contributor

Programmer-RD-AI commented Oct 26, 2021

hi,
Sorry for asking this many questions but I am confused that's why.

For freezing layers
model = googlenet()
model.some_fc.requires_grad = False

and for fine-tuning

model = goognet()
from model.some_fc = Linear(512,985) to model.some_fc = Sequential([Linear(512,1024),Linear(1024,985)])

So what are the feature I need to create?

I am sorry for asking this many questions.

Thank you.

@oke-aditya
Copy link
Contributor

oke-aditya commented Oct 26, 2021

Ok so let me elaborate a bit more.

Let me explain the transfer learning scenarios. These examples are written for CNNs, but kind-of generalize over other models too. Note that when we are doing transfer learning, it means we are using the pre-trained weights. Hence pretrained=True for all cases.

First two scenarios are clearly well described in Transfer learning Tutorial. (Great one by @chsasank.
One of the best in this field!

  1. Simply re-training the model with pretrained=True.

This is most simple approach, we aren't freezing the backbone. Refer here in the tutorial

model = resnet50(pretrained=True)
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes)

Simply train the model. We train each and every parameter, with just the difference being that we have num_classes instead of 1000.
Naive approach, works fine, can give you decent results. It will take lot of time though (You are training a model anyway! and have lot of paremeters to train)

  1. Training only the head feature extractor.

Refer here in the tutorial

This is what you tried above. Here we are interested in only training the classification head of the network. We freeze the backbone of the model.

model = resnet50(pretrained=True)

# Freeze all the parameters.

for param in model.parameters():
     param.requires_grad = False

# Unfreeze the head.
# This simply replaces the head with num_classes
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes)

# You may prefer to add an extra fully connected layer, but that isn't needed in most cases.
# Left to you, many don't prefer, as it can cause large increase in parameters.
# This would work well if you have BERT / millions of params in the backbone and adding a few hundreds of params in the 
# head of the model won't make big difference. 
# Basically no of params with pre-trained weights >>> number of fully connected params. 

# Adding extra fc to head

in_features = model.fc.in_features
model.fc = nn.Sequential(
      [
                nn.Linear(in_features, hidden_params),
               # Many prefer dropout in between to avoid over-fitting
                # nn.Dropout(0.2)
                nn.Linear(hidden_params, num_classes)
      ]
)
  1. Unfreezing layers / blocks. one by one.

This is where Fine-tuning comes into play, we really want to make most of every block of network.

You can first freeze the backbone and train the head with Strategy 2

This can be trained for a few epochs. with a decent learning rate of 1e-3

Here is the second training training routine.

Now you want to freeze each block / specific blocks, say 5 of last Conv layers (Or residual block) in ResNet.
You would unfreeze only them.
Continue training them with a slightly lower lr of 1e-4, and for much longer epochs.

You may unfreeze more blocks / probably stop here. It is very much left to you.
Note that after unfreezing a block you train them progressively. (You don't freeze the Linear layers when you unfreeze the conv blocks)

I don't know if there is any other way of transfer learning, (I haven't seen any other approach), These work well in practice.

P.S.

First of all, my appreciation to you! You are very young developer (I guess 14), and I'm super excited that you know so much stuff at such a tender age! At your age I was probably more interested in knowing how to install anti-virus and knew nothing about coding. (forget GitHub account, I didn't even know the word GitHub)
You have Have a great and bright future ahead! Wish you success.

@Programmer-RD-AI
Copy link
Contributor

Programmer-RD-AI commented Oct 27, 2021

OK thank you I can understand the issue now.
I will start working on it.

Thank you very much @oke-aditya

@Programmer-RD-AI
Copy link
Contributor

hi,

Again I am really sorry for asking this many questions but I am not understanding this correctly.

So what I need to implement is

  1. Simply re-training the model with pretrained=True
  2. Training only the head feature extractor.
  3. Unfreezing layers/blocks. one by one.

I need to implement the above features in lightning bolts in an easier way.

Is my understanding correct?
I am so sorry for asking this many questions.

If not what are the specific thing I need to work on or implement.

With best regards,
Ranuga

@Borda Borda modified the milestones: v0.4, v0.5 Nov 26, 2021
@Borda Borda pinned this issue Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants