You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi great tool! I just set urlwatch up and it works on one test page. but when I further explore and put into one product on HomeDepot and Walmart, they both failed.
then I tried to test the filters for Homedepot list: urlwatch --test-filter and both give me empty result.
I further commented the xpath in filter in the above settings. and run with verbose:
for this Homedepot list, it hangs here! I further followed the #575 to add: ##headers: User-Agent: <redacted>, then it runs but still no result.
then I tried to test the filters for Walmart list:
$urlwatch --test-filter 2 --verbose #xpath filter commented
Skip to main ...
JavaScript is Disabled
Sorry, this webpage requires JavaScript to function correctly.
Please enable JavaScript in your browser and reload the page.
I tried suggestion of #465, and tried to download the walmart page directly: curl directly replies blocked, while wget will download a page with text:
Are you human?
Seems like a silly question, we know. But, we want to keep robots off of Walmart.ca!
seems Walmart blocks such automatic watch?
any help is appreciated.
thx
The text was updated successfully, but these errors were encountered:
Yes, if the pages applies a Captcha to avoid automated tools to grab the page contents, there's not much we can do. Have you checked if Walmart or Home Depot provides an API for grabbing pricing information?
Maybe it's possible to use an API for that purpose?
Thank you for the quick response!
I believe the human detection is only for the walmart.com case. the HomeDepot.ca has connection error 443, which I am not sure whether is caused by the same reason.
I previously build some similar small project use requests-html and BS4 + headless browser, and I did not encounter the human/robot detection on most commercial sites, maybe I will go back and give walmart.ca a try.
Any way thank you for the suggestion on the Walmart API.
Have you tried using https://urlwatch.readthedocs.io/en/latest/jobs.html#browser (just change url to navigate) which uses a headless variant of the Chrome browser to load the page? Maybe this is "good enough" to make it not trigger the Captcha.
Hi great tool! I just set
urlwatch
up and it works on one test page. but when I further explore and put into one product on HomeDepot and Walmart, they both failed.then I tried to test the filters for Homedepot list:
urlwatch --test-filter
and both give me empty result.I further commented the
xpath
infilter
in the above settings. and run withverbose
:for this Homedepot list, it hangs here! I further followed the #575 to add:
##headers: User-Agent: <redacted>
, then it runs but still no result.then I tried to test the filters for Walmart list:
I tried suggestion of #465, and tried to download the walmart page directly:
curl
directly repliesblocked
, whilewget
will download a page with text:seems Walmart blocks such automatic watch?
any help is appreciated.
thx
The text was updated successfully, but these errors were encountered: