First Step accuracy? #1

qidiso · 2018-05-04T22:59:05Z

i have not a p40 gpu card,but i have two 1080 gpu cards; you can set batch-size 512;but i only can set 128 instead.
in your readme article ,you train 40 thousand times in your first step; so how many times i can train in my first step? in the first step ,how much can you get accuracy in lfw or agedb-30 ?

moli232777144 · 2018-05-05T00:05:54Z

lr_steps may be set to 80000. Acc: 95.6, just uploaded data. It still need three days to verify the project.

qidiso · 2018-05-05T01:06:43Z

thanks @moli232777144 .i try it

moli232777144 · 2018-05-07T01:54:52Z

Unfortunately.Bad results.your experiment?

qidiso · 2018-05-07T02:04:14Z

samples/sec acc=0.366797
INFO:root:Epoch[33] Batch [11280] Speed: 861.70 samples/sec acc=0.361328
INFO:root:Epoch[33] Batch [11300] Speed: 864.48 samples/sec acc=0.368750
INFO:root:Epoch[33] Batch [11320] Speed: 871.56 samples/sec acc=0.372070
INFO:root:Epoch[33] Batch [11340] Speed: 866.92 samples/sec acc=0.361914
INFO:root:Epoch[33] Batch [11360] Speed: 877.25 samples/sec acc=0.362500
INFO:root:Epoch[33] Batch [11380] Speed: 865.76 samples/sec acc=0.372266
INFO:root:Epoch[33] Batch [11400] Speed: 878.27 samples/sec acc=0.365625
INFO:root:Epoch[33] Batch [11420] Speed: 867.70 samples/sec acc=0.353906
INFO:root:Epoch[33] Batch [11440] Speed: 870.65 samples/sec acc=0.367578
INFO:root:Epoch[33] Batch [11460] Speed: 866.44 samples/sec acc=0.369531
INFO:root:Epoch[33] Batch [11480] Speed: 875.99 samples/sec acc=0.352344
INFO:root:Epoch[33] Batch [11500] Speed: 873.52 samples/sec acc=0.371875
INFO:root:Epoch[33] Batch [11520] Speed: 867.85 samples/sec acc=0.352539
INFO:root:Epoch[33] Batch [11540] Speed: 865.43 samples/sec acc=0.351758
lr-batch-epoch: 1e-05 11553 33
testing verification..
(12000, 128)
infer time 11.886539
[lfw][502000]XNorm: 11.133125
[lfw][502000]Accuracy-Flip: 0.98900+-0.00484
testing verification..
(14000, 128)
infer time 14.814039
[cfp_fp][502000]XNorm: 9.110006
[cfp_fp][502000]Accuracy-Flip: 0.84514+-0.01910
testing verification..
(12000, 128)
infer time 12.223364
[agedb_30][502000]XNorm: 10.877486
[agedb_30][502000]Accuracy-Flip: 0.93533+-0.01299
saving 251
INFO:root

qidiso · 2018-05-07T02:05:12Z

[lfw][530000]Accuracy-Flip: 0.99000+-0.00459
testing verification..
(14000, 128)
infer time 14.187055
[cfp_fp][530000]XNorm: 9.111494
[cfp_fp][530000]Accuracy-Flip: 0.84843+-0.01903
testing verification..
(12000, 128)
infer time 12.007815
[agedb_30][530000]XNorm: 10.877945
[agedb_30][530000]Accuracy-Flip: 0.93417+-0.01218
saving 265
INFO:root:Saved checkpoint to "../models/MobileFaceNet/model-y1-arcface-0265.params"
[530000]Accuracy-Highest: 0.93683

qidiso · 2018-05-07T02:10:36Z

i feel my result is more bad. i use cmd:
CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py --network y1 --ckpt 2 --loss-type 4 --lr-steps 160000,240000,280000,320000 --emb-size 128 --per-batch-size 128 --data-dir ../data/faces_ms1m_112x112 --pretrained ../models/MobileFaceNet/model-y1-softmax,20 --prefix ../models/MobileFaceNet/model-y1-arcface

muzi2045 · 2018-05-07T13:05:45Z

第一步的训练参数有问题, 我这边在自己的机器上训练出来的结果准确率达不到要求

qidiso · 2018-05-07T13:44:58Z

@moli232777144 me too! now i train again just use cmd:
CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py --network y1 --loss-type 4 --margin-m 0.5 --data-dir ../data/faces_ms1m_112x112 --pretrained ../models/MobileFaceNet/model-y1-softmax,28 --prefix ../models/MobileFaceNet/model-y1-arcface --emb-size 128 --per-batch-size 150
maybe auto dropout is better

moli232777144 · 2018-05-07T14:38:15Z

uploaded！weight decay should be set to 0.00004.

qidiso · 2018-05-07T15:04:46Z

can you share me first step softmax result models?

qidiso · 2018-05-08T06:56:51Z

any progress?

moli232777144 · 2018-05-08T07:33:25Z

[2018-05-08 15:30:36] INFO:root:Epoch[9] Batch [1060] Speed: 588.76 samples/sec acc=0.279980
[2018-05-08 15:30:54] INFO:root:Epoch[9] Batch [1080] Speed: 588.33 samples/sec acc=0.281934
[2018-05-08 15:31:11] INFO:root:Epoch[9] Batch [1100] Speed: 588.85 samples/sec acc=0.276074
[2018-05-08 15:31:28] lr-batch-epoch: 0.1 1120 9
[2018-05-08 15:31:28] testing verification..
[2018-05-08 15:31:28] INFO:root:Epoch[9] Batch [1120] Speed: 590.26 samples/sec acc=0.280859
[2018-05-08 15:31:41] (12000, 128)
[2018-05-08 15:31:41] infer time 12.936783
[2018-05-08 15:31:45] [lfw][68000]XNorm: 11.173922
[2018-05-08 15:31:45] [lfw][68000]Accuracy-Flip: 0.99283+-0.00472
[2018-05-08 15:31:45] testing verification..
[2018-05-08 15:32:01] (14000, 128)
[2018-05-08 15:32:01] infer time 15.572022
[2018-05-08 15:32:05] [cfp_fp][68000]XNorm: 9.046101
[2018-05-08 15:32:05] [cfp_fp][68000]Accuracy-Flip: 0.86486+-0.01647
[2018-05-08 15:32:05] testing verification..
[2018-05-08 15:32:18] (12000, 128)
[2018-05-08 15:32:18] infer time 12.247801
[2018-05-08 15:32:22] [agedb_30][68000]XNorm: 11.032911
[2018-05-08 15:32:22] [agedb_30][68000]Accuracy-Flip: 0.94050+-0.01049
[2018-05-08 15:32:22] saving 34
[2018-05-08 15:32:22] [68000]Accuracy-Highest: 0.94167
[2018-05-08 15:32:22] INFO:root:Saved checkpoint to "/data/output/model-y1-arcface-0034.params"

qidiso · 2018-05-08T07:41:41Z

i find i can get 99.37% in lfw on the 40000 steps ,but i train 70000 steps ,i can get only 99.1% in lfw.maybe we should set lr =0.01 in the 40000 steps

moli232777144 · 2018-05-08T07:51:54Z

you can try it. i still need a day to run this experiment.

qidiso · 2018-05-08T08:02:30Z

@moli232777144 i try it .if i get goods result ,i will reports the log

qidiso · 2018-05-08T22:14:04Z

not good result .i get 99.45 in lfw and 94.50 in agedb ,so it can't be higher.

moli232777144 · 2018-05-09T08:08:05Z

updated.we maybe should increase the number of iterations until acc is stable.

muzi2045 · 2018-05-10T01:08:17Z

in the second step, I got 99.2 in lfw and 95.1 in agedb , maybe I need to continue training.
but the acc in the lfw looks like stuck in the 99.2.

qidiso · 2018-05-10T01:12:33Z

in the fist stage ,i get this
lr-batch-epoch: 0.001 3087 16
testing verification..
(12000, 128)
infer time 12.316967
[lfw][206000]XNorm: 11.074202
[lfw][206000]Accuracy-Flip: 0.99383+-0.00289
testing verification..
(14000, 128)
infer time 14.801521
[cfp_fp][206000]XNorm: 9.228846
[cfp_fp][206000]Accuracy-Flip: 0.88029+-0.01851
testing verification..
(12000, 128)
infer time 12.482475
[agedb_30][206000]XNorm: 11.014230
[agedb_30][206000]Accuracy-Flip: 0.95417+-0.00892
saving 103
INFO:root:Saved checkpoint to "../models/MobileFaceNet/model-y1-arcface-0103.params"
[206000]Accuracy-Highest: 0.95417

or this:

lr-batch-epoch: 1e-05 5723 18
testing verification..
(12000, 128)
infer time 12.226353
[lfw][234000]XNorm: 11.085642
[lfw][234000]Accuracy-Flip: 0.99450+-0.00259
testing verification..
(14000, 128)
infer time 14.634496
[cfp_fp][234000]XNorm: 9.239763
[cfp_fp][234000]Accuracy-Flip: 0.87871+-0.01877
testing verification..
(12000, 128)
infer time 12.104834
[agedb_30][234000]XNorm: 11.024038
[agedb_30][234000]Accuracy-Flip: 0.95100+-0.00723
saving 117
INFO:root:Saved checkpoint to "../models/MobileFaceNet/model-y1-arcface-0117.params"
[234000]Accuracy-Highest: 0.95417

or this
lr-batch-epoch: 1e-05 1677 21
testing verification..
(12000, 128)
infer time 11.989666
[lfw][268000]XNorm: 11.078677
[lfw][268000]Accuracy-Flip: 0.99417+-0.00271
testing verification..
(14000, 128)
infer time 13.049772
[cfp_fp][268000]XNorm: 9.235129
[cfp_fp][268000]Accuracy-Flip: 0.87629+-0.01867
testing verification..
(12000, 128)
infer time 12.188487
[agedb_30][268000]XNorm: 11.015260
[agedb_30][268000]Accuracy-Flip: 0.95267+-0.00989
saving 134
INFO:root:Saved checkpoint to "../models/MobileFaceNet/model-y1-arcface-0134.params"
[268000]Accuracy-Highest: 0.95417
,no
i now training the last stage, but i don't know how to choose one to train ,now i choose 117.prarm to try

qidiso · 2018-05-10T22:49:07Z

@moli232777144 have you good results?

moli232777144 · 2018-05-11T01:26:21Z

lr 0.1，+40000steps，lr 0.01 +20000steps，i get agedb 95.59，lfw 99.51，i will continue to extend the steps.

muzi2045 · 2018-05-11T06:29:04Z

thanks, I'll try it in the next time.

qidiso · 2018-05-13T06:24:29Z

i training again and again .so i now get a better result:
lr-batch-epoch: 0.001 7999 0
testing verification..
(12000, 128)
infer time 12.323731
[lfw][8000]XNorm: 11.118196
[lfw][8000]Accuracy-Flip: 0.99583+-0.00375
testing verification..
(14000, 128)
infer time 14.580451
[cfp_fp][8000]XNorm: 9.335661
[cfp_fp][8000]Accuracy-Flip: 0.88786+-0.01615
testing verification..
(12000, 128)
infer time 12.362448
[agedb_30][8000]XNorm: 11.044563
[agedb_30][8000]Accuracy-Flip: 0.96083+-0.00827
saving 4
INFO:root:Saved checkpoint to "../models/MobileFaceNet/model-y1-arcface-0004.params"
[8000]Accuracy-Highest: 0.96133

moli232777144 · 2018-05-14T01:44:29Z

good job！Modify parameters?Fine-tune process？

muzi2045 · 2018-05-14T02:56:39Z

you are trained on the single card?
I trained on the single Titan X, but finally I just got lfw: 99.47% agedb_30: 99.53% on the step 2.
maybe I need change the batch_size 256 to 512, unfortunately，there is no enough CUDA memory.

qidiso · 2018-05-14T10:22:08Z

@moli232777144 Fine-tune .but i first train the step3 (s=128),next ,i train step 2(s=64).

xxllp · 2018-05-22T06:44:13Z

各位都是多大GPU 哈，128 batch 都不够哈

yc-huang · 2018-05-30T09:43:23Z

@xxllp 8G显存应该可以支持batch size 180; 另外把数据放到ssd可以显著提高训练速度，在1070单卡上可以达到450 samples/s

zhangxiaopang88 · 2019-01-21T08:35:02Z

您好，请问训练的时候打印的acc是什么精度啊 @moli232777144

moli232777144 · 2019-01-21T08:38:37Z

训练的数据本身分类准确度 @zhangxiaopang88

zhangxiaopang88 · 2019-01-21T08:40:40Z

哦哦，谢谢你，我用asia-celebrity数据集训练的，精度一直在0.44左右，请问您有什么训练方面的技巧，可以给点建议吗 @moli232777144

shiyuanyin mentioned this issue Dec 14, 2018

为什么我训练的时候，INFO： acc一直等于0.000000 #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First Step accuracy? #1

First Step accuracy? #1

qidiso commented May 4, 2018 •

edited

Loading

moli232777144 commented May 5, 2018

qidiso commented May 5, 2018

moli232777144 commented May 7, 2018

qidiso commented May 7, 2018

qidiso commented May 7, 2018

qidiso commented May 7, 2018 •

edited

Loading

muzi2045 commented May 7, 2018

qidiso commented May 7, 2018 •

edited

Loading

moli232777144 commented May 7, 2018

qidiso commented May 7, 2018

qidiso commented May 8, 2018 •

edited

Loading

moli232777144 commented May 8, 2018 •

edited

Loading

qidiso commented May 8, 2018

moli232777144 commented May 8, 2018

qidiso commented May 8, 2018

qidiso commented May 8, 2018 •

edited

Loading

moli232777144 commented May 9, 2018

muzi2045 commented May 10, 2018 •

edited

Loading

qidiso commented May 10, 2018 •

edited

Loading

qidiso commented May 10, 2018

moli232777144 commented May 11, 2018

muzi2045 commented May 11, 2018

qidiso commented May 13, 2018

moli232777144 commented May 14, 2018

muzi2045 commented May 14, 2018

qidiso commented May 14, 2018

xxllp commented May 22, 2018

yc-huang commented May 30, 2018

zhangxiaopang88 commented Jan 21, 2019

moli232777144 commented Jan 21, 2019

zhangxiaopang88 commented Jan 21, 2019 •

edited

Loading

First Step accuracy? #1

First Step accuracy? #1

Comments

qidiso commented May 4, 2018 • edited Loading

moli232777144 commented May 5, 2018

qidiso commented May 5, 2018

moli232777144 commented May 7, 2018

qidiso commented May 7, 2018

qidiso commented May 7, 2018

qidiso commented May 7, 2018 • edited Loading

muzi2045 commented May 7, 2018

qidiso commented May 7, 2018 • edited Loading

moli232777144 commented May 7, 2018

qidiso commented May 7, 2018

qidiso commented May 8, 2018 • edited Loading

moli232777144 commented May 8, 2018 • edited Loading

qidiso commented May 8, 2018

moli232777144 commented May 8, 2018

qidiso commented May 8, 2018

qidiso commented May 8, 2018 • edited Loading

moli232777144 commented May 9, 2018

muzi2045 commented May 10, 2018 • edited Loading

qidiso commented May 10, 2018 • edited Loading

qidiso commented May 10, 2018

moli232777144 commented May 11, 2018

muzi2045 commented May 11, 2018

qidiso commented May 13, 2018

moli232777144 commented May 14, 2018

muzi2045 commented May 14, 2018

qidiso commented May 14, 2018

xxllp commented May 22, 2018

yc-huang commented May 30, 2018

zhangxiaopang88 commented Jan 21, 2019

moli232777144 commented Jan 21, 2019

zhangxiaopang88 commented Jan 21, 2019 • edited Loading

qidiso commented May 4, 2018 •

edited

Loading

qidiso commented May 7, 2018 •

edited

Loading

qidiso commented May 7, 2018 •

edited

Loading

qidiso commented May 8, 2018 •

edited

Loading

moli232777144 commented May 8, 2018 •

edited

Loading

qidiso commented May 8, 2018 •

edited

Loading

muzi2045 commented May 10, 2018 •

edited

Loading

qidiso commented May 10, 2018 •

edited

Loading

zhangxiaopang88 commented Jan 21, 2019 •

edited

Loading