-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The number of crawled paper is very small #88
Comments
We have the same issue. I also tried to scrape Google News with a different code before and got 100 results max. per query. It seems that we need pagination but I am not sure how to implement this here. One option would be to work with the start and end dates, going through really small windows of time to collect more results for consecutive days. |
This is a related issue suggesting some workarounds: #31 |
I think the max I can get is 100 a day. Can anyone do better? |
No, we tried crawling by the hour, but then we did not get any additional results. They seem to have the same time stamp for all posts published in one day, so you get the same 100 results wherever you start. In the end, we adjusted the research question a bit to work with 100 results per day but crawl through several months of data. |
I think I managed to get a few more by iterating every hour but I think this is okay for now- thanks for the response. I think the error warning of 'must be 1 day apart else no results will return' should be altered however because you can get results by looking from hour to hour. However, iterating by hour doesn't work 100% from my experience. |
Hi plz some one help me with this:
I utilized GNews to crawl News from 2023.10.1 to 2024.3.10 about the "Red Sea Crisis", but only got about 80 papers.
But when I search key word in Factiva for the same duration, it has results about 3000 articles. I am doing NLP analysis so the volume of articles is quite essential.
Is the number of articles being limited by GNews or it simply does not have that much articles on Google News?
The text was updated successfully, but these errors were encountered: