Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

dfget卡死问题/ dfget take too long to restart #311

Closed
lggeor opened this issue Jan 7, 2019 · 9 comments · Fixed by #582
Closed

dfget卡死问题/ dfget take too long to restart #311

lggeor opened this issue Jan 7, 2019 · 9 comments · Fixed by #582
Labels
areas/df-get kind/bug This is bug report for project

Comments

@lggeor
Copy link

lggeor commented Jan 7, 2019


name: dfget卡死现象
about: dfget卡死现象原因是什么,是否是bug还是某种特定情况下会出现


Question

目前在测试dragonfly,在kubernate环境中,以静态pod形式部署了两台supernode,以daemonset形式部署了客户端,dfdaemon --registry参数指定到harbor仓库。
客户端版本来自于一下官方镜像:
http://dragonfly-os.oss-cn-beijing.aliyuncs.com/df-client_0.2.0_linux_amd64.tar.gz
在客户端运行docker pull,某些镜像下载正常,某些镜像会出现卡死现象,卡死时甚至过了进20分钟才超时。
看dfclient.log,接近20分钟一直在试图下载直到 download timeout(1149s),:

[2019-01-07 14:06:32,580] ERROR sign:345064-1546868841.351 lineno:332 : piece range:20971520-25165823 error,realMd5:37a1a01672bdcf6a861615e0b73bfc03,expectedMd5:81acd03c81a67dfd115fa8900313230f,dstIp:10.16.33.2,total:521
[2019-01-07 14:06:32,588] ERROR sign:345064-1546868841.351 lineno:332 : piece range:16777216-20971519 error,realMd5:37a1a01672bdcf6a861615e0b73bfc03,expectedMd5:46989902ecda3b694a45b14f32de9ee9,dstIp:10.16.33.2,total:521
[2019-01-07 14:06:32,596] ERROR sign:345064-1546868841.351 lineno:332 : piece range:8388608-12582911 error,realMd5:37a1a01672bdcf6a861615e0b73bfc03,expectedMd5:8fb7dcfad74537126dbe1da24c3eaf48,dstIp:10.16.33.2,total:521
[2019-01-07 14:06:32,598] WARNING sign:345064-1546868841.351 lineno:167 : has not available pieceTask,maybe resource lack
[2019-01-07 14:06:32,602] ERROR sign:345064-1546868841.351 lineno:332 : piece range:12582912-16777215 error,realMd5:37a1a01672bdcf6a861615e0b73bfc03,expectedMd5:83526ce1278678e5f6688c69a78690cc,dstIp:10.16.33.2,total:521
[2019-01-07 14:06:32,604] WARNING sign:345064-1546868841.351 lineno:167 : has not available pieceTask,maybe resource lack
[2019-01-07 14:06:32,608] WARNING sign:345064-1546868841.351 lineno:167 : has not available pieceTask,maybe resource lack
[2019-01-07 14:06:32,617] ERROR sign:345064-1546868841.351 lineno:332 : piece range:0-4194303 error,realMd5:37a1a01672bdcf6a861615e0b73bfc03,expectedMd5:1d0fa006a74c02934295284d5421b09d,dstIp:10.16.33.2,total:521
[2019-01-07 14:06:32,624] INFO sign:345064-1546868841.351 lineno:110 : pull piece task result:{u'msg': u'piece resource lack', u'code': 602} and sleep 1.566 ...
[2019-01-07 14:06:32,777] ERROR sign:345064-1546868841.351 lineno:79 : download timeout(1149s)
[2019-01-07 14:06:32,779] INFO sign:345064-1546868841.351 lineno:60 : local http result:success for path:/client/ and cost:0.002
[2019-01-07 14:06:32,779] INFO sign:345064-1546868841.351 lineno:94 : |c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c|https://10.250.250.16/v2/devops/jry-java/blobs/sha256:0ffa5ac9f3c5c761212037c66effa9c800973aca2f0583dcd24e5eed6e7d2848|74693368|0|10.16.33.2|com_ops_dragonfly|1151.427|
[2019-01-07 14:06:32,780] INFO sign:345064-1546868841.351 lineno:111 : download FAIL cost:1151.429s length:74693368 reason:0

看supernode日志/home/admin/supernode/logs/app.log看不出什么问题,感觉supernode是下载成功的:

2019-01-07 13:47:23.759  INFO 9 --- [http-nio-8080-exec-1] c.a.d.s.repository.TaskRepository        : get file length:74693368 from http client about taskId:c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c
2019-01-07 13:47:23.759  INFO 9 --- [http-nio-8080-exec-1] c.a.d.s.service.impl.CdnManagerImpl      : do trigger cdn start for taskId:c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c,httpLen:74693368
2019-01-07 13:47:23.760  INFO 9 --- [Thread-7] c.a.d.s.service.impl.CdnManagerImpl      : do trigger cdn success for taskId:c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c
2019-01-07 13:47:23.763  INFO 9 --- [pool-1-thread-2] c.a.d.supernode.service.cdn.Downloader   : taskId:c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c fileUrl:https://10.250.250.16/v2/devops/jry-java/blobs/sha256:0ffa5ac9f3c5c761212037c66effa9c800973aca2f0583dcd24e5eed6e7d2848 on downloader
2019-01-07 13:47:48.426  INFO 9 --- [Thread-8] c.a.d.s.service.impl.CdnReporterImpl     : taskId:c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c fileLength:74693458 status:SUCCESS from:local
2019-01-07 13:47:48.426  INFO 9 --- [Thread-8] c.a.d.supernode.service.cdn.SuperWriter  : taskId:c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c readCost:24004,totalCost:24663,fileLength:74693368,realMd5:84ff00a39874c6f4d68bc23c74575b0f

想了解
1.卡死的原因是什么?
2.如果能判断出造成卡死的原因是否能不要等待这么长时间,直接重试

@pouchrobot pouchrobot added the status/more-info-needed This means that this issue need to input more information to address the issue clearly label Jan 7, 2019
@allencloud allencloud changed the title dfget卡死问题 [bug]dfget卡死问题 Jan 7, 2019
@pouchrobot pouchrobot added the kind/bug This is bug report for project label Jan 7, 2019
@allencloud allencloud added areas/df-get and removed status/more-info-needed This means that this issue need to input more information to address the issue clearly labels Jan 7, 2019
@allencloud
Copy link
Contributor

Please follow the guide to add the version information of dfget @lggeor
In addition, please make sure you have read the issue dragonflyoss/dragonfly#291 to enter the group.

@lggeor lggeor changed the title [bug]dfget卡死问题 dfget卡死问题/ dfget take too long to restart Jan 7, 2019
@lggeor
Copy link
Author

lggeor commented Jan 7, 2019

@dragonflyoss dragonflyoss deleted a comment from pouchrobot Jan 7, 2019
@lowzj
Copy link
Member

lowzj commented Jan 16, 2019

[2019-01-07 14:06:32,580] ERROR sign:345064-1546868841.351 lineno:332 : piece range:20971520-25165823 error,realMd5:37a1a01672bdcf6a861615e0b73bfc03,expectedMd5:81acd03c81a67dfd115fa8900313230f,dstIp:10.16.33.2,total:521

dfclient.log日志中看,是下载的分片数据与实际数据不符,尝试清理supernode端的本地缓存数据:

/home/admin/supernode/repo/download/c67/c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c
/home/admin/supernode/repo/download/c67/c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c.meta
/home/admin/supernode/repo/download/c67/c676b9e09f098a37d06e5fb55a2d42b8abf735f00a1cb5408c7072d36455720c.md5

@huanbaogihub
Copy link

huanbaogihub commented Jan 29, 2019

我也出现这个问题,有解决方法吗?
通过supernode下载的文件md5不正确,备注dfget的md5校验后,文件不完整而且不能打开
源站下载没问题
我用的supernode和dfget都是v0.2.0

@huanbaogihub
Copy link

好吧,我的是nginx配置问题

@huerlei
Copy link

huerlei commented Apr 23, 2019

请问这个问题解决办法是什么

@lijianfeng1993
Copy link

我也碰到了,有的镜像可以拉取,有的镜像的某一层一直retry,最后502。后来重启supernode容器,就没问题了。

@starnop
Copy link
Contributor

starnop commented May 9, 2019

@lowzj Do you have any new ideas about this problem?

@huerlei
Copy link

huerlei commented May 20, 2019

[2019-05-20 15:40:39,445] ERROR sign:17032-1558337991.812 lineno:332 : piece range:62914560-78643199 error,realMd5:ee0152e53c46658500bfc0c45feb51e5,expectedMd5:3fe30dd4761a8fe998932b9a1d8c5636,dstIp:10.110.91.164,total:168
MD5这种报错 有解决办法了吗

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
areas/df-get kind/bug This is bug report for project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants