Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero results (again) #96

Open
teracow opened this issue Feb 9, 2020 · 8 comments
Open

zero results (again) #96

teracow opened this issue Feb 9, 2020 · 8 comments
Assignees
Labels

Comments

@teracow
Copy link
Owner

teracow commented Feb 9, 2020

Yes, Google have updated their page-code again, so some new regexes are needed to scrape the links.

Working on it now ...

@teracow teracow self-assigned this Feb 9, 2020
@teracow teracow added the bug label Feb 9, 2020
@teracow
Copy link
Owner Author

teracow commented Feb 9, 2020

Threw a quick scraper together that seems to work (haven't pushed it up to here yet).

But it's only finding a maximum of 104 unique images across 10 pages. Hmm ... have to keep looking. Unfortunately, I'm out of time now, so I'll keep looking tomorrow.

Google have certainly advanced their page-code. It gets harder each time to extract the original image URLs. 😆

@teracow
Copy link
Owner Author

teracow commented Feb 10, 2020

OK we're out of action for now. I'll need to decode the endless-page scripting in order to request more than a single page of image results.

I'm not in a coding-cycle at the moment, and I'm unable to say when I'll be able to get around to this. Hopefully, it'll be the next time I have a few days free. 😞

@teracow
Copy link
Owner Author

teracow commented Feb 11, 2020

If anyone would like to have a shot at fixing this, you're more than welcome. 😁

The current issue is: I can scrape the new results page, but can't trigger the endless page scrolling. So, if I separately request 10 pages of results, I actually get the first page x 10 times (with the same 100-or-so results listed on that first page).

@teracow
Copy link
Owner Author

teracow commented Feb 11, 2020

I've pushed the new scraper to GitHub, so at least results from the first page can be found.

Now need to work out how to request the rest of the results pages (again).

@LeaTaka
Copy link

LeaTaka commented Feb 19, 2020

Your scraper is the fastest I found, thanks!
Compared to iCrawler and google-images-download which are also struggling with the Google code change, you have at least have made it work for one page (approx. 40 img)!

What I suggest as a temporary workaround is to implement the parameters below to your parameters list like this;

--adjusted-period-min [PRESET] 
--adjusted-period-max [PRESET] 

The idea is that this should allow dowload images for multiple specified periods and thus requesting multiple pages for each class. if I do this 10 times for each class I will have 400 images per class, which is curently enough for me. Do it 20 times and you'll have your 800 again.

Unfortunately I have not the skills to produce the above suggestion...., otherwise I would have contributed more instead of only suggesting what to do :) . Hope the idea helps to solve the issue soon though.

@teracow
Copy link
Owner Author

teracow commented Feb 19, 2020

The idea is that this should allow dowload images for multiple specified periods and thus requesting multiple pages for each class. if I do this 10 times for each class I will have 400 images per class, which is curently enough for me. Do it 20 times and you'll have your 800 again.

That's an interesting idea. 🤓

But I'm not sure what you mean by specified periods. Do you mean the Google search parameter called 'time'?

@LeaTaka
Copy link

LeaTaka commented Feb 20, 2020

Ni I didn't mean the time parameter, you allready offer this I guess. I was hoping there is a similar custom-period functionaly as in the text search, but it doesn't unfortunately.

However the workaround can be quite simple. Just add a year (2011) to the search phrase and with a bit of luck the page only returns images regarding your search phrase of that year. This needs a bit more testing, but first checks seem promising.

Another Google setting that is interesting is the Searchsettings option under Settings. There you can specify the quantity of search results per page. Maybe this setting can help to get more than 40 img per run.

Cheers!

@teracow
Copy link
Owner Author

teracow commented Feb 20, 2020

Okiedoke, some good thoughts there.

I'll see if I can spend some time on it this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants