Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct download in browser is unstable. Could you offer a method by which we can download the data from the command line, e.g. wget? #35

Open
johnson-magic opened this issue Nov 8, 2021 · 2 comments

Comments

@johnson-magic
Copy link

Hi:

Thank you for your datasets. Direct download in browser is unstable. Could you offer a method by which we can download the data from the command line, e.g. wget? @liminghao1630 @wolfshow @ranpox

@yweweler
Copy link

wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.001
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.002
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.003
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.004
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.005
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.006
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.007
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.008
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.009
wget https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.010

Since I also had issues with files beeing corrupt I computed the md5-sums for comparison.

MD5 File
c702b68b84642b289b6de7b87bf004eb DocBank_500K_ori_img.zip.001
a2328a17e582db16611483f218f7fac2 DocBank_500K_ori_img.zip.002
f534da5cc25004c79b055f894eeab2d3 DocBank_500K_ori_img.zip.003
a1aeb655366d0b124aee39c50d72592d DocBank_500K_ori_img.zip.004
acfbdc634765985f6916427c0475372d DocBank_500K_ori_img.zip.005
fb99074b46c8046ade9bff4be29718d5 DocBank_500K_ori_img.zip.006
fee26227cb26d684d94c312c35230aea DocBank_500K_ori_img.zip.007
4992adec456221890266980911ff3ebd DocBank_500K_ori_img.zip.008
3eac5561bb51bfef5cf6c9cff293b673 DocBank_500K_ori_img.zip.009
698aabef3ee3c713f2595e1f22ff857b DocBank_500K_ori_img.zip.010

@zzhanghub
Copy link

wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.001?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.002?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.003?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.004?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.005?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.006?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.007?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.008?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.009?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D" && 
wget "https://layoutlm.blob.core.windows.net/docbank/dataset/DocBank_500K_ori_img.zip.010?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D"

The URL needs to be placed inside quotation marks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants