Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法爬取用户的回答 #1

Open
funny-cat-happy opened this issue Apr 24, 2022 · 2 comments
Open

无法爬取用户的回答 #1

funny-cat-happy opened this issue Apr 24, 2022 · 2 comments

Comments

@funny-cat-happy
Copy link

问题

大佬,按照你的代码无法获取用户回答,只能获取用户基本信息。我就想你那样在run文件中只运行user_crawler
2022-04-24 10:10:53.824 | WARNING | zhihu_crawler.extractors:extract_data:526 - method: extract_user return: {'user_id': '4a69baf4e0a552d2047fabcc4501a0bb', 'user_name': '我们的太空', 'user_url_token': 'wo-men-de-tai-kong', 'user_head_img': 'https://pic2.zhimg.com/v2-af532a0c65340c09a4549e1e8194e050_l.jpg?source=32738c0c', 'user_is_org': True, 'user_headline': '太空不再高冷 知乎走近你我', 'user_type': 'people', 'user_is_active': True, 'user_description': '既然选择了太空 便只顾风雨兼程', 'user_is_advertiser': False, 'user_is_vip': False, 'user_badges': ['已认证账号', '优秀回答者'], 'user_follower_count': 1907822, 'user_following_count': 170, 'user_answer_count': 247, 'user_question_count': 82, 'user_articles_count': 2798, 'user_columns_count': 4, 'user_zvideo_count': 1585, 'user_pins_count': 1368, 'user_favorite_count': 1, 'user_favorited_count': 63434, 'user_reactions_count': 79890, 'user_shared_count': 0, 'user_voteup_count': 342047, 'user_thanked_count': 60174, 'user_following_columns_count': 1, 'user_following_topic_count': 14, 'user_following_question_count': 271, 'user_following_favlists_count': 0, 'user_participated_live_count': 1, 'user_included_answers_count': 36, 'user_included_articles_count': 33, 'user_recognized_count': 22, 'user_cover_url': 'https://pica.zhimg.com/v2-bbb942fe238dd540204fff9ce849cd2a_r.jpg?source=32738c0c', 'user_org_name': '我们的太空\n123847892739487123', 'user_org_industry': '党群政府-党群政府', 'user_org_url': '', 'user_org_lic_code': '123847892739487123'}

关于参数的问题

我按照你的代码自己写了一个程序,但是知乎一直返回参数异常,能否看一下问题。其中encypt文件未作改动

import hashlib
import os
import requests
import execjs
from encrypt import encrypt
import re

payload = {
    'include': 'data%5B*%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action'
               '%2Cannotation_detail%2Ccollapse_reason%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment'
               '%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission'
               '%2Cmark_infos%2Ccreated_time%2Cupdated_time%2Creview_info%2Cexcerpt%2Cis_labeled%2Clabel_info'
               '%2Crelationship.is_authorized%2Cvoting%2Cis_author%2Cis_thanked%2Cis_nothelp%2Cis_recognized%3Bdata'
               '%5B*%5D.vessay_info%3Bdata%5B*%5D.author.badge%5B%3F%28type%3Dbest_answerer%29%5D.topics%3Bdata%5B'
               '*%5D.author.vip_info%3Bdata%5B*%5D.question.has_publishing_draft%2Crelationship',
    'offset': 0,
    'limit': 20,
    'sort_by': 'created'
}
proxies = {'http': 'http://localhost:8888', 'https':'http://localhost:8888'}

def get_headers(url):
    X_ZSE_93="101_3_2.0",
    sign, cookies = encrypt(X_ZSE_93, ''.join(re.sub(r'.*zhihu\.com', '', url)))
    headers = {
        'cookie': f'd_c0={cookies}',
        'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36",
        'x-zse-93': "101_3_2.0",
        'x-zse-96': sign,
    }
    return headers
response = requests.get(url='https://www.zhihu.com/api/v4/members/xiao-jie-jie-3-19/answers', params=payload,
                        headers=get_headers('https://www.zhihu.com/api/v4/members/xiao-jie-jie-3-19/answers'),proxies=proxies,verify=False)
@niuniuJQKKK
Copy link
Owner

niuniuJQKKK commented Apr 24, 2022

问题

大佬,按照你的代码无法获取用户回答,只能获取用户基本信息。我就想你那样在run文件中只运行user_crawler 2022-04-24 10:10:53.824 | WARNING | zhihu_crawler.extractors:extract_data:526 - method: extract_user return: {'user_id': '4a69baf4e0a552d2047fabcc4501a0bb', 'user_name': '我们的太空', 'user_url_token': 'wo-men-de-tai-kong', 'user_head_img': 'https://pic2.zhimg.com/v2-af532a0c65340c09a4549e1e8194e050_l.jpg?source=32738c0c', 'user_is_org': True, 'user_headline': '太空不再高冷 知乎走近你我', 'user_type': 'people', 'user_is_active': True, 'user_description': '既然选择了太空 便只顾风雨兼程', 'user_is_advertiser': False, 'user_is_vip': False, 'user_badges': ['已认证账号', '优秀回答者'], 'user_follower_count': 1907822, 'user_following_count': 170, 'user_answer_count': 247, 'user_question_count': 82, 'user_articles_count': 2798, 'user_columns_count': 4, 'user_zvideo_count': 1585, 'user_pins_count': 1368, 'user_favorite_count': 1, 'user_favorited_count': 63434, 'user_reactions_count': 79890, 'user_shared_count': 0, 'user_voteup_count': 342047, 'user_thanked_count': 60174, 'user_following_columns_count': 1, 'user_following_topic_count': 14, 'user_following_question_count': 271, 'user_following_favlists_count': 0, 'user_participated_live_count': 1, 'user_included_answers_count': 36, 'user_included_articles_count': 33, 'user_recognized_count': 22, 'user_cover_url': 'https://pica.zhimg.com/v2-bbb942fe238dd540204fff9ce849cd2a_r.jpg?source=32738c0c', 'user_org_name': '我们的太空\n123847892739487123', 'user_org_industry': '党群政府-党群政府', 'user_org_url': '', 'user_org_lic_code': '123847892739487123'}

关于参数的问题

我按照你的代码自己写了一个程序,但是知乎一直返回参数异常,能否看一下问题。其中encypt文件未作改动

import hashlib
import os
import requests
import execjs
from encrypt import encrypt
import re

payload = {
    'include': 'data%5B*%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action'
               '%2Cannotation_detail%2Ccollapse_reason%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment'
               '%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission'
               '%2Cmark_infos%2Ccreated_time%2Cupdated_time%2Creview_info%2Cexcerpt%2Cis_labeled%2Clabel_info'
               '%2Crelationship.is_authorized%2Cvoting%2Cis_author%2Cis_thanked%2Cis_nothelp%2Cis_recognized%3Bdata'
               '%5B*%5D.vessay_info%3Bdata%5B*%5D.author.badge%5B%3F%28type%3Dbest_answerer%29%5D.topics%3Bdata%5B'
               '*%5D.author.vip_info%3Bdata%5B*%5D.question.has_publishing_draft%2Crelationship',
    'offset': 0,
    'limit': 20,
    'sort_by': 'created'
}
proxies = {'http': 'http://localhost:8888', 'https':'http://localhost:8888'}

def get_headers(url):
    X_ZSE_93="101_3_2.0",
    sign, cookies = encrypt(X_ZSE_93, ''.join(re.sub(r'.*zhihu\.com', '', url)))
    headers = {
        'cookie': f'd_c0={cookies}',
        'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36",
        'x-zse-93': "101_3_2.0",
        'x-zse-96': sign,
    }
    return headers
response = requests.get(url='https://www.zhihu.com/api/v4/members/xiao-jie-jie-3-19/answers', params=payload,
                        headers=get_headers('https://www.zhihu.com/api/v4/members/xiao-jie-jie-3-19/answers'),proxies=proxies,verify=False)

如要获取回答,需answer_count 赋值 ;如 :

for info in user_crawler('wo-men-de-tai-kong', answer_count=50):
# 通过info['answers'] 可以获取回答列表;
answers = info['answers']

参数问题:知乎加密是需要将完整的各请求参数带上的。具体请参考constant.py中的常用请求URL

@funny-cat-happy
Copy link
Author

大佬万分感谢,确实我的请求URL有问题。再顺便问一下为什么我抓包得到的是
https://www.zhihu.com/api/v4/members/{user_id}/answers
而你的却是
https://api.zhihu.com/members/{user_id}/answers
我没有碰到过这个请求,你是怎么得到的呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants