-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme
205 lines (146 loc) · 6.73 KB
/
readme
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
@Author : lance
@Email : [email protected]
@Time : 2019年6月10日
===================模型选择=========================
1.先选择简单的模型进行fine-tune
2.fine-tune步骤:
最后一层
最后一个卷积块
全网络
3.迁移学习中,目标域和源域差别较大,不用fc的网络比用fc的网络效果差
===================加载数据=========================
model.load_data.py
1.path,batch_size(10,16)
2.input_shape根据模型默认尺寸确定,显存足够的话可以考虑增大模型输入尺寸,相应需要减小batch_size
3.Augmentation在ImageDataGenerator中修改相关参数即可,参考https://www.cnblogs.com/hutao722/p/10075150.html
4.classes=["","",""] 定义自己的类别名
***注意***.flow_from_directory中的shuffle=True是默认的,训练和验证时不用修改,但是用作测试时候一定要改为False
5.rescale=1/255
===================模型训练=========================
model.train.py
1.参数:classes = 3
epochs = 200
steps_per_epoch >= train_nums//batch
validation_steps >= valid_nums//batch
2.模型函数所需的input_shape形参,根据各模型而定 ----注意,必须是元组类型
3.weights:保存在验证集上最高acc的模型文件
4.logs:训练过程记录
5.如果使用tensorboard将其添加到callback中(已经定义好了)
6.学习率设定为监控验证集loss,经过10个loss不变则降低为之前的1/2
7.训练时候一定记得开多线程workers=16
===================模型测试=========================
test.py
1.path:模型文件
2.steps=test_nums/batch
3.input_shape:对应模型的默认输入尺寸
4.注意:修改测试文件的位置,classes=["","",""] 定义自己的类别名,shuffle=False
===================预测=========================
predict.py
predict("weights/bcnn.h5","data/test/B",[224,224])
===================继续训练=========================
model.continue_train.py
continue_train(path,epochs,steps_per_epoch,validation_steps,input_shape)
===================显存问题 GTX1060=========================
1.resnext:56,56
2.senet:64,64
3.octconv:显存不够
===================bug整理=========================
1.问题:AttributeError: 'bytes' object has no attribute 'encode'
解决方法:修改C:\Anaconda3\envs\wyl\lib\site-packages\keras\engine\saving.py中的321行:n.encode('utf8') for n in
如果是encode则改为decode,反之相反
2.读图方式的差异
skimage和keras系列的不同
注意:skimage的transform.resize已经除了255,不用再除255了
keras系列的相同
keras系列包括:from keras.preprocessing import image
ImageDataGenerator
eg.120张c4使用同样的模型,kerse系列识别率92.5%,skimage识别率为93.3%
3.训练测试关于BN层的设置https://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/
===================GPU加速=========================
结论:
第一:keras跑现有模型都能使用单GPU或多GPU(GTX1060,RTX2080ti)加速,配置环境NVIDIA driver 410 以上版本,cuda 9.0以上版本即可。
第二:keras数据预处理速度慢,是限制训练的根本。
第三:踢除GPU,ubuntu18.04台式机的硬件配置在训练加速上不如win10台式机。
第四:keras训练分类模型:
参数设置workers=32(大一些好), use_multiprocessing=False(True会快一些), max_queue_size=20(大一些好),太大没有意义CPU跑满了;
现在跑分类模型最快平均一个epoch(148 for train 30 for val)9s左右。
data:
Found 148 images belonging to 3 classes.
Found 30 images belonging to 3 classes.
keras.fit_generator设定
workers=32, use_multiprocessing=False, max_queue_size=10
win10台式机CPU16线程,内存32G:CPU占用75~79%,内存占用6G
RTX2080ti 最大11GB 可用8.99GB
vgg16(20,5,1)
batch = 30,根据batch调整Train_steps和validation_steps
20 epochs of batch 30 takes total time 197.78
vgg16(20,10,2)
batch = 15
20 epochs of batch 15 takes total time 185.68
vgg16(20,15,3)
batch = 10
20 epochs of batch 10 takes total time 182.57
vgg16(20,25,5)
batch = 6
20 epochs of batch 6 takes total time 178.18
win10台式机CPU16线程,内存32G:CPU占用76~83%,内存占用5.3~5.6G
GTX1060 最大6GB 可用4.97GB
batch = 30
20 epochs of batch 30 takes total time 201.05
warning:Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
batch = 15
20 epochs of batch 15 takes total time 201.77
batch = 10
20 epochs of batch 10 takes total time 194.74
batch = 6
20 epochs of batch 6 takes total time 188.67
ubuntu18.04台式机:CPU 20线程,内存64G
峰值15-16 thread runing,70%~80%CPU占用
RTX2080ti 最大10.72GB 可用10.33GB
峰值56%
batch = 30
20 epochs of batch 30 takes total time 404.95
batch = 15
20 epochs of batch 15 takes total time 377.54
batch = 10
20 epochs of batch 10 takes total time 385.09
batch = 6
20 epochs of batch 6 takes total time 371.07
keras.fit_generator设定
workers=32, use_multiprocessing=False, max_queue_size=20
ubuntu18.04台式机:CPU 20线程,内存64G
峰值20 thread runing,90+%CPU占用
RTX2080ti 最大10.72GB 可用10.33GB
峰值93%
vgg16(20,5,1)
batch = 30
20 epochs of batch 30 takes total time 399.53
对比max_queue_size=10..有加快,
ubuntu18.04台式机:CPU 20线程,内存64G
峰值20 thread runing,90+%CPU占用
双RTX2080ti 最大10.72GB 一个可用10.33GB 一个可用10.53GB
第一个epoch 一个80+%一个90+%(数值只是相对准确)
正常epoch峰值一个52%一个51%
batch = 30
20 epochs of batch 30 takes total time 399.55
batch = 15
20 epochs of batch 15 takes total time 391.86
batch = 10
20 epochs of batch 10 takes total time 354.00
batch = 6
20 epochs of batch 6 takes total time 344.01
keras.fit_generator设定
workers=32, use_multiprocessing=True, max_queue_size=20
ubuntu18.04台式机:CPU 20线程,内存64G
峰值20 thread runing,100%CPU占用(所有thred全部100%)
双RTX2080ti 最大10.72GB 一个可用10.33GB 一个可用10.53GB
第一个epoch 一个80+%一个90+%
正常epoch峰值一个54%一个40%
batch = 30
20 epochs of batch 30 takes total time 389.52
batch = 15
20 epochs of batch 15 takes total time 378.32
batch = 10
20 epochs of batch 10 takes total time 346.65
batch = 6
20 epochs of batch 6 takes total time 338.29