input image=112*112 #4

eeric · 2021-08-18T15:12:37Z

@volgachen
can you provide model structure with input=112*112,?
thanks!

volgachen · 2021-08-19T09:49:29Z

There are several ways to modify the original model structure. For example,

Change argument patch_size from 4 to 2 to modify the patch embedding module at the very beginning.

model = PyramidVisionTransformer(
    patch_size=2, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
    norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], **kwargs)

Remove a certain stage (I prefer the second stage) and its corresponding DePatch module.
Simply run the original model without any modification. (Setting img_size=112 at here).

However, I cannot tell which could be the best without experiments. You can try it. I am glad to see your results.

eeric · 2021-08-19T10:56:58Z

thanks!

1.PyramidVisionTransformer, failed;
2.Remove a certain stage, complicated, failed;
3.Setting img_size=112, failed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input image=112*112 #4

input image=112*112 #4

eeric commented Aug 18, 2021

volgachen commented Aug 19, 2021

eeric commented Aug 19, 2021

input image=112*112 #4

input image=112*112 #4

Comments

eeric commented Aug 18, 2021

volgachen commented Aug 19, 2021

eeric commented Aug 19, 2021