-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
w_webid
related discovery
#1107
Comments
直接获取html,访问频繁的话经常会遇到弹验证码的情况,如果做自动化操作的时候还是很影响的 |
@QHCT JWT的TTL是一天(事实上网页里还有一个setTimeout自动在快到期的时候刷新网页),应该不需要反复访问刷新? |
api.bilibili.com/x/space/wbi/acc/info 这个接口在原有wbi签名基础上加上w_webid依然报错,返回-404。 |
我试了返回403,{"code":-403,"message":"访问权限不足","ttl":1},用了浏览器提取的cookie也不行 |
b站真的是无可救药了,不想着优化推流算法,成天折腾这些乱七八糟的参数 |
我真是曹乐 b站闲着没事儿干能不能喝点西北风阿 js & cheerio实现:lovegaoshi/azusa-player-mobile@f411399#diff-83ff38add308188f73aa86d9fd6c89ada413eb4f34903c2c14c08653edc31c0a |
Python & httpx 实现 import httpx
import re
import urllib.parse
import json
import time
from functools import reduce
from hashlib import md5
UID = 3546729368520811
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
"Referer": "https://www.bilibili.com/",
"Cookie": "<cookie>",
}
dynamic_url = f"https://space.bilibili.com/{UID}/dynamic"
text = httpx.get(dynamic_url, headers=headers).text
# <script id="__RENDER_DATA__" type="application/json">xxx</script>
__RENDER_DATA__ = re.search(
r"<script id=\"__RENDER_DATA__\" type=\"application/json\">(.*?)</script>",
text,
re.S,
).group(1)
access_id = json.loads(urllib.parse.unquote(__RENDER_DATA__))["access_id"]
print(f'access_id: {access_id}')
# wbi 签名
mixinKeyEncTab = [
46, 47, 18, 2, 53, 8, 23, 32, 15, 50, 10, 31, 58, 3, 45, 35, 27, 43, 5, 49,
33, 9, 42, 19, 29, 28, 14, 39, 12, 38, 41, 13, 37, 48, 7, 16, 24, 55, 40,
61, 26, 17, 0, 1, 60, 51, 30, 4, 22, 25, 54, 21, 56, 59, 6, 63, 57, 62, 11,
36, 20, 34, 44, 52
]
def getMixinKey(orig: str):
"对 imgKey 和 subKey 进行字符顺序打乱编码"
return reduce(lambda s, i: s + orig[i], mixinKeyEncTab, "")[:32]
def encWbi(params: dict, img_key: str, sub_key: str):
"为请求参数进行 wbi 签名"
mixin_key = getMixinKey(img_key + sub_key)
curr_time = round(time.time())
params["wts"] = curr_time # 添加 wts 字段
params = dict(sorted(params.items())) # 按照 key 重排参数
# 过滤 value 中的 "!'()*" 字符
params = {
k: "".join(filter(lambda chr: chr not in "!'()*", str(v)))
for k, v in params.items()
}
query = urllib.parse.urlencode(params) # 序列化参数
wbi_sign = md5((query + mixin_key).encode()).hexdigest() # 计算 w_rid
params["w_rid"] = wbi_sign
return params
def getWbiKeys() -> tuple[str, str]:
"获取最新的 img_key 和 sub_key"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
"Referer": "https://www.bilibili.com/",
}
resp = httpx.get("https://api.bilibili.com/x/web-interface/nav", headers=headers)
resp.raise_for_status()
json_content = resp.json()
img_url: str = json_content["data"]["wbi_img"]["img_url"]
sub_url: str = json_content["data"]["wbi_img"]["sub_url"]
img_key = img_url.rsplit("/", 1)[1].split(".")[0]
sub_key = sub_url.rsplit("/", 1)[1].split(".")[0]
return img_key, sub_key
img_key, sub_key = getWbiKeys()
# mid=3546729368520811&web_location=333.999&w_webid=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzcG1faWQiOiIwLjAiLCJidXZpZCI6IjQxNjdDMjIwLTY4RDAtOTIxMS05RkQyLUY2OTc2MTQ5QzU0NzYzODE5aW5mb2MiLCJ1c2VyX2FnZW50Ijoi7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzEzMC4wLjAuMCBTYWZhcmkvNTM3LjM2IEVkZy8xMzAuMC4wLjAiLCJidXZpZF9mcCI6Ijk3ZTYxZDdjMTQ2ZTJmM2M4MWIyNjRmMTgzYThmMDI3IiwiYmlsaV90aWNrZXQiOiI3YzdkNGY4M2Q5NWY3MDU3MWFhM2E0OGY3ZDhiOWU2MyIsImNyZWF0ZWRfYXQiOjE3MjkzNzgwNTAsInR0bCI6ODY0MDAsInVybCI6Ii8zNTQ2NzI5MzY4NTIwODExL2R5bmFtaWMiLCJyZXN1bHQiOiJub3JtYWwiLCJpc3MiOiJnYWlhIiwiaWF0IjoxNzI5Mzc4MDUwfQ.Bsq7sOO0U8kJqiOIRDQdAQTsFshUFaQTt1be8m0B8fRHCfPik00Qszt3ja8vjI-7huBp1we2HHHf4QyhydmMXHTsQTYT55Gy1Y1AK3JlEndnoG42Q3sxnz2n1lp7Rne49vPUzh0wmjCC1CLqNY9_Wj3ZGSYhrotRqyDKC_cFMH8MZoWSIftVrC7JrI9Kt31jym9N4F70R4HdzNzROndhHxismIA9dtBRQtzhF2BARiIyRDrfFRazfHFrCU9piD0Axf4612KNtBjK808Rym03RfA2mXELZNJGjW8TCfZOjdPsHCutH-gMOnbfSjFPbgrWUeUI3CNy9zbKbUODyry6tw&w_rid=608d77c216b277cd196651eb4c6be538
signed_params = encWbi(
params={
"mid": UID,
"web_location": 333.999,
"w_webid": access_id,
},
img_key=img_key,
sub_key=sub_key,
)
query = urllib.parse.urlencode(signed_params)
relation_url = "https://api.bilibili.com/x/space/wbi/acc/relation"
resp = httpx.get(f"{relation_url}?{query}", headers=headers)
print(resp.json())
|
说明人浮于事,无事可干。这些风控的意义在哪里? |
防 AI 训练爬虫?虽然不认为B站有高价值的数据...我倒是看到不少爬虫单子,毕竟知乎不登录都不少限制了 |
接口 https://api.bilibili.com/x/space/wbi/acc/info 报错 -403 访问权限不足 import axios from 'axios';
import lodash from 'lodash';
async function getUserInfoByUid(uid) {
const url = 'https://api.bilibili.com/x/space/wbi/acc/info'
const cookie = 'buvid3=073B762A-2483-F772-FFD5-978C07FDB70C31024infoc; b_nut=1729935831; b_lsid=1036B363E_192C8886C89; _uuid=753712DA-9F36-16C2-5576-54F1DD3A7DE632710infoc; buvid_fp=ac842ef66c103bfa08b4a4bc2dc142fe; buvid4=786BA3FD-AAFE-649F-27E6-9270CAC7880432017-024102609-XM%2FRk2K47ksyOJCAXfjskQ%3D%3D; bili_ticket=eyJhbGciOiJIUzI1NiIsImtpZCI6InMwMyIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MzAxOTUwMzIsImlhdCI6MTcyOTkzNTc3MiwicGx0IjotMX0.nCEWWOa_m7--9sQaH-rM3K4XTk5KxxpaVFgia14S-hk; bili_ticket_expires=1730194972'
const data = {
mid: uid,
token: '',
platform: 'web',
web_location: 1550101,
dm_img_list: [],
dm_img_str: 'V2ViR0wgMS',
dm_cover_img_str: 'QU5HTEUgKEludGVsLCBJbnRlbChSKSBIRCBHcmFwaGljcyBEaXJlY3QzRDExIHZzXzVfMCBwc181XzApLCBvciBzaW1pbGFyR29vZ2xlIEluYy4gKEludGVsKQ';
};
const w_webid = await getWebId(uid); //获取方法 https://github.com/SocialSisterYi/bilibili-API-collect/discussions/1104
const { w_rid, time_stamp } = await getWbiSign(data, cookie); //具体算法 https://github.com/SocialSisterYi/bilibili-API-collect/blob/master/docs/misc/sign/wbi.md
const params = {
...data,
w_webid: w_webid,
w_rid: w_rid,
wts: time_stamp
};
const res = await axios.get(url, {
params,
timeout: 5000,
headers: {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br, zstd',
'Accept-Language': 'zh-CN,en-US;q=0.5',
'Connection': 'keep-alive',
'Priority': 'u=0, i',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Sec-GPC': '1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:132.0) Gecko/20100101 Firefox/132.0',
Cookie: `${cookie}`,
Host: `api.bilibili.com`,
Origin: 'https://space.bilibili.com',
Referer: `https://space.bilibili.com/${uid}/dynamic`
}
});
return res;
}
(async () => {
const resp = await getUserInfoByUid(401742377);
const data = resp.data
console.log(`${JSON.stringify(data)}`);
})(); 大致运行,结果返回
如果缺省 w_webid 值,则直接返回
不知有没有解决的 |
另外目前来看, |
@cxw620 能否麻烦在每次新建此类 issue 时,在顶部以标题的方式,标注注意数据脱敏的 tips?
|
已解决,感谢❤ |
@snowtafir 您好,我也遇到了{"code":-403,"message":"访问权限不足","ttl":1}的问题,请问您是如何解决的,方便的话可以贴一下代码吗? |
目前只有 |
我建议一切 wbi 有问题都自查去...别在这问 有帖子专门讨论 wbi 鉴权的吧 |
referer现在必须是space了,www会报错 |
[WARNING] DO NOT LEAK YOUR PRIVACY INFO!
[注意] 注意你的隐私信息, 评论时注意码掉, 包括但不限于 Cookie, access_key 等!
Notice new param
w_webid
for risk controlling usage. See #1104Intro
w_webid
is a string acting as JSON Web Token (JWT), whose payload contains fingerprinting and tracking info, see the example:Signing ALG is
HS256
.Source
See js file:
https://s1.hdslb.com/bfs/static/jinkela/space/9.space.287534f1741242b75d48126d84e7bef2bb8877c8.js
Similar to WBI signing,
w_webid
also plays the same role.How to get it
Access the HTML directly and get from HTML content:
Search for
<script id="__RENDER_DATA__" type="application/json">***</script>
then URLDecode***
, we will get a json string and inneraccess_id
is what we need.Notice
JWT cannot be faked since we do not know the private key. However, attaching this to the HTML itself instead of serving an REST API may cause perf regression and will such regression be accepted? I don't know.
Last updated: 24/09/21 23:11
Originally posted by @cxw620 in #1104 (comment)
The text was updated successfully, but these errors were encountered: