Short video crawler based on scrapy, crawling with search query of the target sites.
Supports:
| Site | Name | Status |
|---|---|---|
| kuaishou | ✔️ | |
| ixigua | 🚧 | |
| 新片场 | xinpianchang | ✔️ |
| haokan | 🚧 | |
| 度小视/全民小视频* | quanmin | ✔️ |
*度小视/全民小视频官网已经下线,但是目前本项目仍可用(2024.6测试)
requirements:
- python 3.10+
- poetry
git clone https://github.com/dxsooo/ShortVideoCrawl
cd ShortVideoCrawl
poetry install --only main
poetry shellFor example:
cd shortvideocrawl
# main parameters:
# query: query word
# count: target video count
# kuaishou
scrapy crawl kuaishou -a query='蔡徐坤' -a count=50
# xigua, with highest resolution and size smaller than 64 MB, duration smaller than 5 min
# scrapy crawl ixigua -a query='蔡徐坤' -a count=50
# xinpianchang, with highest resolution and size smaller than 64 MB, duration smaller than 5 min, but can only get a fixed number of video
scrapy crawl xinpianchang -a query='蔡徐坤'
# haokan, with highest resolution
# scrapy crawl haokan -a query='蔡徐坤' -a count=50
# quanmin
scrapy crawl quanmin -a query='蔡徐坤' -a count=50videos are saved in ./videos, named with video id of source platform.