GitHub

安装依赖项

pip install -r requirements.txt
npm install
apt install imagemagick
apt install pngquant

`epub-crawler.js`

抓取网页和图片并保存 EPUB，解压后可以得到图片和 HTML。

需要 ImageMagick 和 pngquant。包依赖见文件开头。

node epub-crawler

配置文件`config.json`：

name：保存的文件名称
url：目录页 URL
link：链接<a>的选择器
base：链接<a>的前缀
title：文章页的标题选择器
content：文章页的内容选择器
remove：文章页需要移除的元素的选择器
credit：是否显示原文链接
processMath：是否处理 TeX 公式
processDecl：是否处理 sphinx 类定义
hdrs：HTTP 请求的协议头
list：如果这个列表不为空，则抓取这个列表，忽略url

`img-better.js`

自动压缩图片。需要 ImageMagick 和 pngquant。

node img-better <dir>

`img.js`

保存 HTML 中的图片到同目录的img中，并更新 HTML 中的链接。

node img <file|dir>

`trans.py`

调用谷歌翻译按段落翻译 HTML。

python trans.py <file|dir>

`tomd.js`

将 HTML 转化为 MD

node tomd <file|dir>

规则定义`my-conventors.js`

RuleObj {

    filter: string|Array[string]|function(Element):boolean,
    replacement: function(string, Element):string
}

module.exports: Array[RuleObj]

`sina-short.js`

将 MD 中的链接转换为新浪短网址。

node sina-short <file>

`process_tex.js`

将 MD/HTML 中的 TeX 公式转换为图片。

node process_tex <dir>

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ao3.py		ao3.py
bili.py		bili.py
comic.js		comic.js
comic.py		comic.py
comp_epub.js		comp_epub.js
comp_epub.py		comp_epub.py
dagongrenbaodian.py		dagongrenbaodian.py
deviantart.py		deviantart.py
dl_asset_docsify.js		dl_asset_docsify.js
docker_pub.py		docker_pub.py
ebook2site.py		ebook2site.py
epubjs-reader.zip		epubjs-reader.zip
extcode.js		extcode.js
fetch-medium.py		fetch-medium.py
fetch_pages.py		fetch_pages.py
fetch_webarchive.py		fetch_webarchive.py
flatten.py		flatten.py
fmtzh.js		fmtzh.js
g4g_crawl.js		g4g_crawl.js
gen_summary.js		gen_summary.js
gh_util.js		gh_util.js
gn.py		gn.py
gzh.py		gzh.py
gzhlist2.py		gzhlist2.py
kaggle.js		kaggle.js
kaggle.py		kaggle.py
kan_util.js		kan_util.js
keyframe.py		keyframe.py
libgen_upload.py		libgen_upload.py
lightnovel.js		lightnovel.js
lightnovel.py		lightnovel.py
md5_to_ipfs.py		md5_to_ipfs.py
mlm_crawl.js		mlm_crawl.js
my-conventors.js		my-conventors.js
nhentaidl.py		nhentaidl.py
nowcoder.py		nowcoder.py
package.json		package.json
packpdf.py		packpdf.py
proc_geektime.py		proc_geektime.py
process-tex.js		process-tex.js
process_repos.js		process_repos.js
procimg4comic.py		procimg4comic.py
requirements.txt		requirements.txt
rm_tag_in_pre.py		rm_tag_in_pre.py
sele_crawler.py		sele_crawler.py
sina-short.js		sina-short.js
split.js		split.js
stealth.min.js		stealth.min.js
stylish_apress.py		stylish_apress.py
tag_yrq.py		tag_yrq.py
tomd.js		tomd.js
transet2epub.py		transet2epub.py
update.sh		update.sh
util.js		util.js
whole_site.py		whole_site.py
wiki-tool.py		wiki-tool.py
woshipm_dl.py		woshipm_dl.py
wx_external.py		wx_external.py
zhihu-ques-sele.py		zhihu-ques-sele.py
zhihu-ques.js		zhihu-ques.js
zhihu-ques.py		zhihu-ques.py
zip.py		zip.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

安装依赖项

`epub-crawler.js`

配置文件`config.json`：

`img-better.js`

`img.js`

`trans.py`

`tomd.js`

规则定义`my-conventors.js`

`sina-short.js`

`process_tex.js`

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

apachecn/doctool

Folders and files

Latest commit

History

Repository files navigation

安装依赖项

epub-crawler.js

配置文件config.json：

img-better.js

img.js

trans.py

tomd.js

规则定义my-conventors.js

sina-short.js

process_tex.js

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`epub-crawler.js`

配置文件`config.json`：

`img-better.js`

`img.js`

`trans.py`

`tomd.js`

规则定义`my-conventors.js`

`sina-short.js`

`process_tex.js`

Packages