How do I use Apify proxies with Crawlee? #575
-
My code: async with Actor:
proxy_configuration = await Actor.create_proxy_configuration()
crawler = BeautifulSoupCrawler(
request_handler=router,
proxy_configuration=proxy_configuration,
configuration=Configuration(log_level="DEBUG" if debug else "INFO"),
)
await crawler.run(links) It doesn't seem to work. There's no feedback in the Actor Run log. I have no idea what happens underneath. I couldn't figure out how to debug this, I cannot even print I went through several guides in the docs, but there is no single example which has both Actor and Crawlee crawler there. And I'm getting type errors, too: Even when running the crawler with I tried to figure out more about what can I pass to the crawler, but the docs are very brief: What are all those things? How do I use them? What they're good for? Compare with e.g. Click: I guess the kwargs are propagated to basic crawler, but there the docs only say "Initialize the BasicCrawler": There's no information on what I'm lost in the rabbit hole of reading code and missing docs. For some reason my requests get blocked with HTTP 403. The same requests work fine from my laptop. I suppose it's anti-scraping protection, also because the target site is Reddit. But I don't know if it's failing despite running through proxies or whether I set up the proxies wrong. Tried both basic proxies and residential ones. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Figured out that |
Beta Was this translation helpful? Give feedback.
Figured out that
await Actor.create_proxy_configuration()
throws if it doesn't get correct info from env variables. Also there's Inspecting current proxy in crawlers after all. And all the parameters are described - but in code, not in the docs. Rendering the API docs useless, it's just much better to read the code directly. Not sure why the type checking doesn't work, but at least I'm not stuck anymore.