-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2022-04-06号开始 好像都抓不到文章内容了 看图 #64
Comments
我是这么解决的,在deal_data.py中的 import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.get("http://mp.weixin.qq.com/s?__biz=MzI4MTQxMjExMw==&mid=2247484946&idx=1&sn=2ba55c5c2e82457ea9c23ad600ddc1ea&chksm=eba8d16cdcdf587afbe233be563c2e521143be1139c1975e4bfe014c7c6248c386885bac4403&scene=27#wechat_redirect")
time.sleep(1)
# element = driver.find_element(by=By.CLASS_NAME, 'rich_media_content')
element = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "rich_media_content")))
result = driver.execute_script('''
var result = document.getElementsByClassName("rich_media_content")[0].innerText;
let xhr = new XMLHttpRequest()
let url = "http://127.0.0.1:9001/?a=" + encodeURIComponent(result);
xhr.open("get", url, false);
xhr.send(null);
return result;
''')
content = driver.page_source
soup = BeautifulSoup(content, "lxml")
driver.close() |
标题现在要变成这样了:selector.xpath('//h1[@Class="rich_media_title "]/text()') |
最新的content = '//div[@id="js_content"]' |
太奇怪了,为什么还需要用selenium呢?按理说所有的请求都会过mitmproxy才对,为什么文章内容可以绕过mitmproxy直接在手机上看到呢? |
我本地调试发现mitmproxy也抓不到文章内容了,所有response.text里都没有文章内容相关的任何信息。 |
The text was updated successfully, but these errors were encountered: