-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
download the dataset #12
Comments
Maybe accessing WebArxive urls are restricted! If the problem remains, drop me an email. |
Thanks for your reply! sometimes It' hard for us to access some website.......
I still can't get the dataset, could you send me the raw data or the processed data via google drive or dropbox?
Thanks for your hard work!
|
I have the same problem. The server is not stable. I downloaded about 2000 data for the first time then i rerun the scripts, it cannot download anymore. |
Hi @shashiongithub, I am having similar issues downloading the data with the script. At the moment, I am working on a paper and would love to use xsum dataset for my experiment. I was hoping if you could share them with me through other channels. I tried contacting you through your email but could not get the email to send to your mailbox. My email is [email protected] Thanks a lot! |
Here is the dataset: http://kinloch.inf.ed.ac.uk/public/XSUM-EMNLP18-Summary-Data-Original.tar.gz Please use train, development and test ids from github to split into subsets. |
Hi, |
Hey, I'm not able to open the link either. Can you please help? |
Thanks! |
hey! link is broken :/ can you share updated one for me, so i can download the dataset.. |
That url creates a dir called |
Few things to keep in mind:
Please use the training/dev/test ids provided here to find which one to use:
bbc-summary-data/bbcid.summary --> xsum-extracts-from-downloads/bbcid.data In each summary file: With these changes the preprocessing scripts should work. |
For context, I'm trying to replicate the results in the bart paper Thanks! |
Hello, I am also not being able to access any links posted in this thread. Could you please post a working URL? Update: I found that the posted url works if we do "wget http://bollin.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz" from a terminal. It does not work from my browser. |
The same problem here. Could you please post a new URL? |
|
It geneates an error if use the downloaded dataset. Please see the details as follows. While write the abiove-mentioned weblink (listed as follows again) http://bollin.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz into the xsum.py _URL_DATA = "http://bollin.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz" And then run the code as follows.
It generates the error as follows.. ReadError: unexpected end of data The above exception was the direct cause of the following exception: DatasetGenerationError: An error occurred while generating the dataset The dataset source may have a problem. Notes: However, if use the original code, it can run successfully.
|
thanks for your excellent work!
when I run download-bbc-articles.py, it showed that
I want to konw why, thanks for your help~
The text was updated successfully, but these errors were encountered: