Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

现在有什么办法手动更新IP池吗? #515

Open
cresstoo opened this issue Dec 7, 2024 · 1 comment
Open

现在有什么办法手动更新IP池吗? #515

cresstoo opened this issue Dec 7, 2024 · 1 comment

Comments

@cresstoo
Copy link

cresstoo commented Dec 7, 2024

用了kuaidaili的私密代理,保守设置了2个IP池爬了xhs的90条笔记就开始MediaCrawler ERROR (core.py:309) - [XiaoHongShuCrawler.get_note_detail_async_task] Get note detail error
有什么手动替换过期IP的办法吗?

@NanmiCoder
Copy link
Owner

可以尝试在请求发起前的入口函数进行IP过期时间判断

async def request(self, method, url, **kwargs) -> Union[str, Any]:
"""
封装httpx的公共请求方法,对请求响应做一些处理
Args:
method: 请求方法
url: 请求的URL
**kwargs: 其他请求参数,例如请求头、请求体等
Returns:
"""
# return response.text
return_response = kwargs.pop("return_response", False)
async with httpx.AsyncClient(proxies=self.proxies) as client:
response = await client.request(method, url, timeout=self.timeout, **kwargs)
if response.status_code == 471 or response.status_code == 461:
# someday someone maybe will bypass captcha
verify_type = response.headers["Verifytype"]
verify_uuid = response.headers["Verifyuuid"]
raise Exception(
f"出现验证码,请求失败,Verifytype: {verify_type},Verifyuuid: {verify_uuid}, Response: {response}"
)
if return_response:
return response.text
data: Dict = response.json()
if data["success"]:
return data.get("data", data.get("success", {}))
elif data["code"] == self.IP_ERROR_CODE:
raise IPBlockError(self.IP_ERROR_STR)
else:
raise DataFetchError(data.get("msg", None))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants