Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taylorfrancis.json no longer scrapes the PDF: readcube pop-up related? #54

Open
rossmounce opened this issue Dec 2, 2016 · 0 comments

Comments

@rossmounce
Copy link
Member

I suspect this is down to the recent readcube'ization of T&F content. The PDF download button takes you to a pop-up where you can choose between real PDF or Readcube. I tried to solve this myself but I failed.

$ quickscrape --url http://dx.doi.org/10.1017/s1477201903001093  --scraper journal-scrapers/scrapers/taylorfrancis.json --output tandf -l verbose
info: quickscrape 0.4.7 launched with...
info: - URL: http://dx.doi.org/10.1017/s1477201903001093
info: - Scraper: /home/ross/Downloads/pica/journal-scrapers/scrapers/taylorfrancis.json
info: - Rate limit: 3 per minute
info: - Log level: verbose
info: urls to scrape: 1
info: processing URL: http://dx.doi.org/10.1017/s1477201903001093
debug: info [scraper]. URL rendered. http://www.tandfonline.com/doi/abs/10.1017/S1477201903001093.
debug: data [scraper]. element captured. publisher.  Taylor & Francis Group .
debug: debug [scraper]. element results. publisher.  Taylor & Francis Group .
debug: data [scraper]. element captured. journal_name. Journal of Systematic Palaeontology.
debug: debug [scraper]. element results. journal_name. Journal of Systematic Palaeontology.
debug: data [scraper]. element capture failed. volume.
debug: debug [scraper]. selector had no results. //*[@id='unit2']/div[1]/div/div/table/tbody/tr/td[1]/h3/a[1]. volume.
debug: debug [scraper]. element results. volume. .
debug: data [scraper]. element capture failed. issue.
debug: debug [scraper]. selector had no results. //*[@id='unit2']/div[1]/div/div/table/tbody/tr/td[1]/h3/a[2]. issue.
debug: debug [scraper]. element results. issue. .
debug: data [scraper]. element captured. title. Osteology and systematic position of the eocene primobucconidae (aves, coraciiformes sensu stricto), with first records from Europe.
debug: debug [scraper]. element results. title. Osteology and systematic position of the eocene primobucconidae (aves, coraciiformes sensu stricto), with first records from Europe.
debug: data [scraper]. element captured. keywords.
debug: debug [scraper]. element results. keywords. .
debug: data [scraper]. element captured. author_name.  Gerald   Mayr .
debug: data [scraper]. element captured. author_name.  Cecile   Mourer‐Chauviré .
debug: data [scraper]. element captured. author_name.  Ilka   Weidig .
debug: debug [scraper]. element results. author_name.  Gerald   Mayr , Cecile   Mourer‐Chauviré , Ilka   Weidig .
debug: data [scraper]. element captured. date_published.
debug: debug [scraper]. element results. date_published. .
debug: data [scraper]. element captured. doi. 9512127.
debug: data [scraper]. element captured. doi. 10.1017/S1477201903001093.
debug: data [scraper]. element captured. doi. Journal of Systematic Palaeontology, Vol. 2, No. 1, 2004, pp. 1-12.
debug: debug [scraper]. element results. doi. 9512127,10.1017/S1477201903001093,Journal of Systematic Palaeontology, Vol. 2, No. 1, 2004, pp. 1-12.
debug: data [scraper]. element capture failed. csv1.
debug: debug [scraper]. selector had no results. //a[@id='CSVdownloadButton'][1]. csv1.
debug: debug [scraper]. element results. csv1. .
debug: data [scraper]. element capture failed. csv2.
debug: debug [scraper]. selector had no results. //a[@id='CSVdownloadButton'][2]. csv2.
debug: debug [scraper]. element results. csv2. .
debug: data [scraper]. element capture failed. csv3.
debug: debug [scraper]. selector had no results. //a[@id='CSVdownloadButton'][3]. csv3.
debug: debug [scraper]. element results. csv3. .
debug: data [scraper]. element capture failed. csv4.
debug: debug [scraper]. selector had no results. //a[@id='CSVdownloadButton'][4]. csv4.
debug: debug [scraper]. element results. csv4. .
debug: data [scraper]. element capture failed. csv5.
debug: debug [scraper]. selector had no results. //a[@id='CSVdownloadButton'][5]. csv5.
debug: debug [scraper]. element results. csv5. .
debug: data [scraper]. element capture failed. csv6.
debug: debug [scraper]. selector had no results. //a[@id='CSVdownloadButton'][6]. csv6.
debug: debug [scraper]. element results. csv6. .
debug: data [scraper]. element captured. fulltext_html. http://dx.doi.org/10.1017/S1477201903001093.
debug: debug [scraper]. element results. fulltext_html. http://dx.doi.org/10.1017/S1477201903001093.
debug: data [scraper]. element capture failed. fulltext_pdf.
debug: debug [scraper]. selector had no results. //a[text()='PDF']. fulltext_pdf.
debug: debug [scraper]. element results. fulltext_pdf. .
debug: info [scraper]. download started. fulltext.html.
info: URL processed: captured 8/17 elements (9 captures failed)
debug: writing results to file: results.json
debug: changing back to top-level directory
info: all tasks completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant