Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt on Train Prototext in this repo. Anyone can repeat their training ever? #130

Open
KleinYuan opened this issue May 26, 2017 · 3 comments

Comments

@KleinYuan
Copy link

KleinYuan commented May 26, 2017

@bittnt
I tried to repeat the training process on PASCAL VOC 2012 (20classes plus background).
Here's my documents: https://github.com/KleinYuan/train-crfasrnn

And here's the prototext I used for training, which is exact same as what you posted in here.

I trained with fcn-8s model and this caffe version and after around 200k iterations. The result was pretty bad (even much worse than fcn-8s) as attached.

Then I thought that I may miss something, so I used this tool to extract info about layers on your pre-trained caffemodel and the result kinda surprised me that I cannot find the MultiStageMeanfield as well as multi_stage_meanfield_param on 57th layer. Were u training with a different architecture or ?

I am quite confused now and it will be great for you to give me some hint and potentially share the actual training prototext?

Note:
At beginning I realized that I used a newer caffe than this repo and therefore I need to add crop_param in here, here and here. After adding those, with your pre-trained model, the demo script output expected images. Therefore, I think the training pipeline and test scripts are ok.

Attached Code, 57th layer of the Caffe -> Json on your pre-trained model:

{
      "blobs_lr": [
        10000,
        10000,
        1000
      ],
      "blobs": [
        {
          "channels": 1,
          "width": 21,
          "num": 1,
          "data": [
            2.684922456741333,
            -0.006915332283824682,
            0.006461392156779766,
            0.014774330891668797,
            0.017609767615795135,
            "(436 elements more)"
          ],
          "height": 21
        },
        {
          "channels": 1,
          "width": 21,
          "num": 1,
          "data": [
            4.664008140563965,
            -0.007745872251689434,
            0.011939325369894505,
            0.013802499510347843,
            0.017575804144144058,
            "(436 elements more)"
          ],
          "height": 21
        },
        {
          "channels": 1,
          "width": 21,
          "num": 1,
          "data": [
            -0.7012288570404053,
            0.008244414813816547,
            -0.008532184176146984,
            -0.013231083750724792,
            -0.01634358800947666,
            "(436 elements more)"
          ],
          "height": 21
        }
      ],
      "top": [
        "upscore"
      ],
      "name": "inference1",
      "bottom": [
        "unary",
        "Q0",
        "data_data_rgb_0_split_2"
      ]
    }

Where as, here's my 57th layer:

{
      "blobs": [
        {
          "shape": {
            "dim": [
              1,
              1,
              21,
              21
            ]
          },
          "data": [
            1.8576951026916504,
            0.08358041942119598,
            0.0802476778626442,
            0.06616068631410599,
            0.07730358093976974,
            "(436 elements more)"
          ]
        },
        {
          "shape": {
            "dim": [
              1,
              1,
              21,
              21
            ]
          },
          "data": [
            4.411709785461426,
            0.05498894676566124,
            0.10218091309070587,
            0.0351337306201458,
            0.04846448451280594,
            "(436 elements more)"
          ]
        },
        {
          "shape": {
            "dim": [
              1,
              1,
              21,
              21
            ]
          },
          "data": [
            -0.1661415994167328,
            -0.5631195306777954,
            -0.6274277567863464,
            -0.5293889045715332,
            -0.5286368131637573,
            "(436 elements more)"
          ]
        }
      ],
      "bottom": [
        "unary",
        "Q0",
        "data_data_rgb_0_split_2"
      ],
      "top": [
        "pred"
      ],
      "multi_stage_meanfield_param": {
        "num_iterations": 5,
        "compatibility_mode": 0,
        "theta_alpha": 59,
        "bilateral_filter_weights_str": "5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5",
        "threshold": 2,
        "theta_gamma": 3,
        "theta_beta": 3,
        "spatial_filter_weights_str": "3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3"
      },
      "param": [
        {
          "lr_mult": 10000
        },
        {
          "lr_mult": 10000
        },
        {
          "lr_mult": 1000
        }
      ],
      "phase": 0,
      "type": "MultiStageMeanfield",
      "name": "inference1"
    }
@KleinYuan
Copy link
Author

Some results:

  1. FCN-8s
    output_fcn_8s

  2. 182k iterations one
    output_182000

  3. 212k iterations one
    output_212000

  4. CRF as RNN pre trained model one
    output_official

@bittnt
Copy link
Collaborator

bittnt commented May 27, 2017

Hi, Thanks for your contribution!
Regarding the results,

  1. It looks like caffe2 does not have multistagemeanfield layers. You might get this layer in python somehow. But given the results, I am not sure if the weights have been transferred successfully.

  2. Also, your learning rate might be too high, looks like your loss increases after training. It would be good practice to check IOU/per-pixel accuracy every 2k/4k iterations.

  3. One thing is that you might consider to extract the FCN-8 component from CRFasRNN pretrained model first, then you should first train it to see if the loss decrease any.

  4. Our pretrained model was obtained by first fine-tuning the plain FCN-32s network (without the CRF-RNN part) on COCO data, then building built an FCN-8s network with the learnt weights, and finally training the CRF-RNN network end-to-end using VOC 2012 training data only.

  5. Newer caffe has been merged with crfasrnn CPU/GPU multistagemeanfield layer in https://github.com/torrvision/caffe/tree/crfrnn

@damiVongola
Copy link

@bittnt Hey i was wondering if you could let us know how many iterations it took for you to get your results. I have read the paper a couple of times and i cannot see any mention about the number of iterations you used (If it is mentioned there somewhere and i missed it, i'm very sorry). Also as a side note, i am currently training the crf-rnn on 6 channel satellite images and it is quite the hassle. Is it recommended to train an fcn-8s first on this 6-channel data before plugging in the meanfield iteration layer? I copied weights for the first 3 channels of the fcn-8s caffemodel, then i randomized the last three channels of the model. This new model is what i am using for training/finetuning. Are there any concerns, tips or issues you can give me on this. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants