Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于反向传播的问题 #3

Open
W-void opened this issue Dec 4, 2017 · 6 comments
Open

关于反向传播的问题 #3

W-void opened this issue Dec 4, 2017 · 6 comments

Comments

@W-void
Copy link

W-void commented Dec 4, 2017

在反向传播更新 Theta2_grad 时,delta3 是不是应该乘上 sigmoidGradient(z3) 之后再进行梯度更新呢

@lawlite19
Copy link
Owner

误差反向传播,最后一层的误差就是预测(前向传播的值)减去label真实值,然后反向传播计算每一层的误差

@W-void
Copy link
Author

W-void commented Dec 4, 2017

更新第二层权重theta2需要求costFunction对theta2的导数,costFunction对theta2的导数可以分解为 z3对theta2的导数 与 costFunction对z3的导数 之积,前者是a2,后者不应该只是 h-class_y 吧,我觉得应该是 (h-class_y)乘上h对z3的导数,即sigmoidGradient(z3)。我是初学者,也是自学者,没有进行过专业的学习,很可能犯一些很幼稚的错误,希望你能多多包涵,不吝赐教。

@lawlite19
Copy link
Owner

好久没看都忘记了,你的理解是对的,但是这里最后使用的是交叉熵损失函数,你可以试着推导一下,应该是没错的。

@lawlite19
Copy link
Owner

刚做了个实验,代码里是有一个利用梯度的定义检测梯度的函数的,如果代价函数定义为均方误差(没加正则):

temp1 = (h.reshape(-1,1)-class_y.reshape(-1,1))
temp2 = (temp1**2).sum()
J = 1/(2*m)*temp2

然后修改

delta3[i,:] = (h[i,:]-class_y[i,:])*sigmoidGradient(z3[i,:])

这样输出的梯度是正常的

[[  5.16628040e-03   5.16628041e-03]
 [ -8.31394759e-06  -8.31394784e-06]
 [  5.41669659e-05   5.41669655e-05]
 [  6.68470207e-05   6.68470206e-05]
 [  1.56912539e-03   1.56912539e-03]
 [ -1.21626675e-05  -1.21626677e-05]
 [  2.54606233e-05   2.54606235e-05]
 [  3.96755345e-05   3.96755349e-05]
 [ -3.47046008e-03  -3.47046009e-03]
 [ -4.82780121e-06  -4.82780112e-06]
 [ -2.66496109e-05  -2.66496103e-05]
 [ -2.39698913e-05  -2.39698907e-05]
 [ -5.31940615e-03  -5.31940616e-03]
 [  6.94345692e-06   6.94345726e-06]
 [ -5.42630105e-05  -5.42630103e-05]
 [ -6.55803162e-05  -6.55803165e-05]
 [ -2.27769095e-03  -2.27769095e-03]
 [  1.23327012e-05   1.23327010e-05]
 [ -3.19844790e-05  -3.19844792e-05]
 [ -4.68952766e-05  -4.68952768e-05]
 [  1.21981949e-01   1.21981949e-01]
 [  6.10153665e-02   6.10153665e-02]
 [  6.10411275e-02   6.10411275e-02]
 [  6.09010195e-02   6.09010195e-02]
 [  6.10584199e-02   6.10584199e-02]
 [  6.09927601e-02   6.09927601e-02]
 [  7.28582367e-02   7.28582368e-02]
 [  3.64005559e-02   3.64005559e-02]
 [  3.63703836e-02   3.63703836e-02]
 [  3.65344619e-02   3.65344619e-02]
 [  3.63501369e-02   3.63501369e-02]
 [  3.64270251e-02   3.64270252e-02]
 [  2.38758286e-02   2.38758286e-02]
 [  1.18930662e-02   1.18930662e-02]
 [  1.20555685e-02   1.20555685e-02]
 [  1.18289556e-02   1.18289556e-02]
 [  1.19627032e-02   1.19627032e-02]
 [  1.20144705e-02   1.20144705e-02]]

(如果我检测梯度函数没写错的话=-=)

@W-void
Copy link
Author

W-void commented Dec 4, 2017

是我惯性思维了,以为这里是欧式距离,你说的是对的。
非常感谢你能写出这么“生撕”机器学习的文章,不用高级库只用numpy实现加深了我不少理解。
提个小tip吧,在算正则化这类的对应项直接相乘的时候,array类型可以直接用’*‘号,相当于matlab的点乘,最后再np.sum,我觉得比较好理解,希望对你有用。
再次感谢

@lawlite19
Copy link
Owner

当时也是刚学完python没多久写的,应该有好多地方不是最简化的。意思意思

@lawlite19 lawlite19 reopened this Sep 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants