Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input image=112*112 #4

Open
eeric opened this issue Aug 18, 2021 · 2 comments
Open

input image=112*112 #4

eeric opened this issue Aug 18, 2021 · 2 comments

Comments

@eeric
Copy link

eeric commented Aug 18, 2021

@volgachen
can you provide model structure with input=112*112,?
thanks!

@volgachen
Copy link
Collaborator

There are several ways to modify the original model structure. For example,

  • Change argument patch_size from 4 to 2 to modify the patch embedding module at the very beginning.
model = PyramidVisionTransformer(
    patch_size=2, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
    norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], **kwargs)
  • Remove a certain stage (I prefer the second stage) and its corresponding DePatch module.
  • Simply run the original model without any modification. (Setting img_size=112 at here).

However, I cannot tell which could be the best without experiments. You can try it. I am glad to see your results.

@eeric
Copy link
Author

eeric commented Aug 19, 2021

thanks!

1.PyramidVisionTransformer, failed;
2.Remove a certain stage, complicated, failed;
3.Setting img_size=112, failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants