Skip to content

Commit 0b4e983

Browse files
committed
Add domain regular expressions
1 parent af95f50 commit 0b4e983

File tree

5 files changed

+142
-19
lines changed

5 files changed

+142
-19
lines changed

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
.DS_Store
2-
auto-proxy*.txt
3-
*-rules-*.txt
2+
*.txt
3+
*.tmp
44
proxy.pac

README.md

Lines changed: 97 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ For high-performance matching: Uses a hash table for domain rules, and an IP pre
1212

1313
## Usage
1414

15-
1. **Domain Rule Configuration**
15+
### 1. **Domain Rule Configuration**
1616

1717
The project contains some example configuration files:
1818

@@ -25,24 +25,31 @@ For high-performance matching: Uses a hash table for domain rules, and an IP pre
2525

2626
To use these files, remove the `.example` extension. Each file represents a different proxy behavior:
2727

28+
#### a. Auto-Proxy
29+
2830
- **Auto-Proxy Rules**: Add rules to `auto-proxy.txt` to control how websites are accessed.
2931

3032
All files starting with `auto-proxy` and ending with `.txt` will be parsed as Auto-Proxy rules. If you have multiple Auto-Proxy rules, you can save them as multiple files, like `auto-proxy-1.txt`, `auto-proxy-2.txt`, etc.
3133

3234
**Note**: Currently, the URL matching rules in Auto-Proxy are ignored, and only domain rules are handled.
3335

36+
#### b. Domain and IP Address
37+
3438
- **Blocked**: Domains added to `domain-rules-blocked.txt` will be blocked from access.
3539
- **Direct**:
3640
- Domains added to `domain-rules-direct.txt` will bypass the proxy and connect directly.
3741
- IPv4 networks (in CIDR format) added to `ipv4-rules-direct.txt` will bypass the proxy and connect directly.
3842
- IPv6 networks (in CIDR format) added to `ipv6-rules-direct.txt` will bypass the proxy and connect directly.
3943
- **Proxy**: Domains added to `domain-rules-proxy.txt` will use the default proxy.
4044

41-
Add your domain names or IP network segments to the appropriate file, with one entry per line. Lines starting with `#` are treated as comments. For example:
45+
Add your domains or IP networks to the appropriate file, one per line. Subdomains will inherit the proxy behavior of their parent domain. You can also add country-level top domains to simplify the configuration. Lines starting with `#` are treated as comments.
46+
47+
For example:
4248

4349
Domains added to `domain-rules-direct.txt`, along with their subdomains, will bypass the proxy and connect directly:
4450
```
4551
# Direct connect domains
52+
cn # All domains ending with .cn will connect directly by default
4653
google.com
4754
example.org
4855
```
@@ -56,7 +63,43 @@ For high-performance matching: Uses a hash table for domain rules, and an IP pre
5663

5764
You can also create your own custom rule files, following the format `<domain|ipv4|ipv6>-rules-<rule_name>.txt`. For example, `domain-rules-companyProxy.txt` will make all domains in this file use the `companyProxy` setting defined in `proxy.pac`. `ipv4-rules-block.txt` will block all networks listed in the file.
5865

59-
2. **Generate the `proxy.pac` File**
66+
#### c. Domain Regular Expressions
67+
68+
- **Domain Regular Expressions**: `domain-regexp.txt` is used to define domain rules based on regular expressions, which allows for flexible matching of similar domains.
69+
70+
File structure:
71+
```
72+
[direct]
73+
# host regex ...
74+
75+
[blocked]
76+
# ...
77+
78+
[proxy]
79+
# ...
80+
```
81+
82+
Each section represents a different proxy behavior, which can be `direct`, `blocked`, `proxy`, or a custom behavior (e.g., `[companyProxy]`).
83+
84+
- **[direct]**: Domains matched by regular expressions in this section will bypass the proxy and connect directly.
85+
- **[blocked]**: Domains matched by regular expressions in this section will be blocked.
86+
- **[proxy]**: Domains matched by regular expressions in this section will use the default proxy.
87+
- **Custom Behavior**: You can add your own section name, such as `[companyProxy]`, to indicate that domains matching those patterns will use a custom proxy configuration.
88+
89+
Each line is a regular expression for matching specific domains or their subdomains. Lines starting with `#` are treated as comments. For example:
90+
```
91+
[direct]
92+
# Direct connection domains
93+
^img-[0-9][0-9].*\.example\.com$
94+
95+
[blocked]
96+
# Blocked domains
97+
^ad-[a-z0-9]\.cdn[0-9]\.example\.com$
98+
```
99+
100+
Ensure that the regular expressions are valid to avoid affecting normal network access.
101+
102+
### 2. **Generate the `proxy.pac` File**
60103

61104
Run the script to generate the `proxy.pac` file:
62105

@@ -66,7 +109,7 @@ For high-performance matching: Uses a hash table for domain rules, and an IP pre
66109

67110
The `proxy.pac` file will be automatically generated in the project root directory.
68111

69-
3. **Default Rule Sources**
112+
### 3. **Default Rule Sources**
70113

71114
The build script [`build.sh`](./build.sh) will, by default, download the following files without overwriting existing files of the same name:
72115

@@ -76,7 +119,7 @@ For high-performance matching: Uses a hash table for domain rules, and an IP pre
76119

77120
If you do not need the Auto-Proxy rules or IP networks rules, you can create empty files with the same name to skip the download.
78121

79-
4. **Proxy Configuration**
122+
### 4. **Proxy Configuration**
80123

81124
The generated `proxy.pac` file uses the following default proxy configurations (note that the default proxy server is `SOCKS5 127.0.0.1:1080`):
82125

@@ -92,7 +135,7 @@ For high-performance matching: Uses a hash table for domain rules, and an IP pre
92135

93136
You can modify these values after generating `proxy.pac`, or customize them directly in the original script `proxy.js` to use different default settings. Please adjust these settings according to your environment and requirements.
94137

95-
5. **Testing**
138+
### 5. **Testing**
96139

97140
If you have Node.js installed, you can run the following command to test and verify the configuration:
98141

@@ -133,7 +176,7 @@ Run `./build.sh` to regenerate the `proxy.pac` file, which will block access to
133176

134177
## 使用方法
135178

136-
1. **域名规则配置**
179+
### 1. **域名规则配置**
137180

138181
项目包含一些示例配置文件:
139182

@@ -143,27 +186,35 @@ Run `./build.sh` to regenerate the `proxy.pac` file, which will block access to
143186
- `domain-rules-proxy.txt.example`
144187
- `ipv4-rules-direct.txt.example`
145188
- `ipv6-rules-direct.txt.example`
189+
- `domain-regexp.txt.example`
146190

147191
要使用这些文件,去掉 `.example` 扩展名。每个文件代表不同的代理行为:
148192

193+
#### a. Auto-Proxy
194+
149195
- **Auto-Proxy 配置的规则**:把规则添加到 `auto-proxy.txt` 中,将会按照规则来访问网站。
150196

151197
所有以 `auto-proxy` 开头,并且以 `.txt` 结尾的文件都按 Auto-Proxy 规则来解析。如果你有多个 Auto-Proxy 规则,可以保存为多个文件,例如 `auto-proxy-1.txt``auto-proxy-2.txt` 等等。
152198

153199
**注意**:当前,会忽略 Auto-Proxy 中的 URL 的匹配规则,仅仅处理域名规则。
154200

201+
#### b. 域名和IP地址
202+
155203
- **Blocked**:添加到 `domain-rules-blocked.txt` 中的域名将被阻止访问。
156204
- **Direct**
157205
- 添加到 `domain-rules-direct.txt` 中的域名将绕过代理,直接连接。
158206
- 添加到 `ipv4-rules-direct.txt` 中的 IPv4 网络段(CIDR格式)将绕过代理,直接连接。
159207
- 添加到 `ipv6-rules-direct.txt` 中的 IPv6 网络段(CIDR格式)将绕过代理,直接连接。
160208
- **Proxy**:添加到 `domain-rules-proxy.txt` 中的域名将使用默认代理。
161209

162-
将你的域名或者IP网络段添加到合适的文件中,每个域名一行。以 `#` 开头的行被视为注释。例如:
210+
将你的域名或者IP网络段添加到合适的文件中,每个域名一行。子域名会继承父域名的代理行为,可以直接添加国家顶级域,以简化配置。以 `#` 开头的行被视为注释。
211+
212+
例如:
163213

164214
文件 domain-rules-direct.txt 中添加的域名及其子域名将会绕过代理直接连接
165215
```
166216
# 直连域名
217+
cn # 默认所有以 .cn 结尾的域名都是直连
167218
google.com
168219
example.org
169220
```
@@ -177,8 +228,42 @@ Run `./build.sh` to regenerate the `proxy.pac` file, which will block access to
177228

178229
你也可以创建自己的自定义规则文件,文件名应遵循 `<domain|ipv4|ipv6>-rules-<rule_name>.txt` 的格式。例如,`domain-rules-companyProxy.txt` 将使该文件中的所有域名使用 `proxy.pac` 中定义的 `companyProxy` 设置。`ipv4-rules-block.txt` 将不可访问文件中的所有网络段。
179230

231+
#### c. 域名正则表达式
232+
233+
- **域名正则表达式**`domain-regexp.txt` 用于灵活地定义基于正则表达式的域名规则。主要用于匹配大量相似的域名规则。
234+
235+
文件结构如下:
236+
```
237+
[direct]
238+
# host regex ...
239+
240+
[blocked]
241+
# ...
242+
243+
[proxy]
244+
# ...
245+
```
246+
每个分段的名称代表不同的代理行为,可以是 `direct``blocked``proxy`,或者你自定义的行为(例如 `[companyProxy]`)。
247+
248+
- **[direct]**:在这一分段中的域名正则表达式将绕过代理,直接连接。
249+
- **[blocked]**:在这一分段中的域名正则表达式将被阻止访问。
250+
- **[proxy]**:在这一分段中的域名正则表达式将使用默认代理进行连接。
251+
- **自定义行为**:你可以新增自己的分段名称,例如 `[companyProxy]`,表示这些匹配的域名将使用自定义代理配置。
252+
253+
每一行是一个域名的正则表达式,可以通过灵活的正则规则来匹配特定的域名或其子域名。以 `#` 开头的行被视为注释。例如:
254+
```
255+
[direct]
256+
# 直连的域名
257+
^img-[0-9][0-9].*\.example\.com$
258+
259+
[blocked]
260+
# 阻止访问的域名
261+
^ad-[a-z0-9]\.cdn[0-9]\.example\.com$
262+
```
263+
264+
确保正则表达式是有效的,以免影响正常的网络访问行为。
180265

181-
2. **生成 `proxy.pac` 文件**
266+
### 2. **生成 `proxy.pac` 文件**
182267

183268
运行脚本生成 `proxy.pac` 文件:
184269

@@ -188,7 +273,7 @@ Run `./build.sh` to regenerate the `proxy.pac` file, which will block access to
188273

189274
在项目根目录中会自动生成 `proxy.pac` 文件。
190275

191-
3. **默认的规则来源**
276+
### 3. **默认的规则来源**
192277
构建脚本 [`build.sh`](./build.sh) 默认会下载以下文件,但不会覆盖已有的同名文件:
193278

194279
- `auto-proxy.txt`
@@ -197,7 +282,7 @@ Run `./build.sh` to regenerate the `proxy.pac` file, which will block access to
197282

198283
如果你不需要 Auto-Proxy 的规则或者 IP 网络段的规则,请创建同名的空文件即可忽略下载。
199284

200-
4. **代理配置**
285+
### 4. **代理配置**
201286

202287
生成的 `proxy.pac` 文件使用以下默认的代理配置(注意默认代理服务器是 `SOCKS5 127.0.0.1:1080`):
203288

@@ -213,7 +298,7 @@ Run `./build.sh` to regenerate the `proxy.pac` file, which will block access to
213298

214299
你可以在生成 `proxy.pac` 后修改这些值,或者直接在原始脚本 `proxy.js` 中进行自定义,以便使用不同的默认设置。请根据实际环境和需求调整这些代理设置。
215300

216-
5. **测试**
301+
### 5. **测试**
217302

218303
如果安装了 Node.js,可以使用以下命令运行测试以验证配置:
219304

build.sh

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,23 @@ generate_pac() {
230230
[[ "$rule" = @(blocked|direct|proxy) ]] || rule="\"$rule\""
231231
printf " \"%s\": %s,\n" "$domain" "$rule"
232232
done | sort -n
233-
sed -n '/ end of proxy rules$/,$p' "$jsfile"
233+
sed -n '/ end of proxy rules$/,/ begin of regexp rules$/p' "$jsfile"
234+
for file in domain-regexp*.txt; do
235+
rule=""
236+
while IFS= read -r line; do
237+
line="${line%%#*}"
238+
line="${line// }"
239+
[[ -n "$line" ]] || continue
240+
if [[ "$line" = \[*\] ]]; then
241+
rule="${line#[}"
242+
rule="${rule%]}"
243+
[[ "$rule" = @(blocked|direct|proxy) ]] || rule="\"$rule\""
244+
elif [[ -n "$rule" ]]; then
245+
printf " [/%s/, %s],\n" "$line" "$rule"
246+
fi
247+
done < "$file"
248+
done
249+
sed -n '/ end of regexp rules$/,$p' "$jsfile"
234250
}
235251

236252
is_up_to_date=true

domain-regexp.txt.example

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[direct]
2+
^.+-mihayo\.akamaized\.net$
3+
^a[1-5]\.mzstatic\.com$
4+
^cdn(-cn)?[1-4]?\.apple-mapkit\.com$
5+
^cl[1-5]-cdn\.origin-apple\.com\.akadns\.net$
6+
7+
[block]
8+
^speed\.(coe|open)\.ad\.[a-z]{2,6}\.prod\.hosts\.ooklaserver\.net$

proxy.js

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,12 @@ const proxyRules = {
8989
// end of proxy rules
9090
};
9191

92+
const domainRegexpRules = [
93+
[ /^adservice\.google\.([a-z]{2}|com?)(\.[a-z]{2})?$/, blocked], // adservice.google.com.xx
94+
// begin of regexp rules
95+
// end of regexp rules
96+
]
97+
9298
class IPv4TrieNode {
9399
constructor() {
94100
this.children = [null, null]; // 0 and 1
@@ -272,7 +278,8 @@ function printMatchingNetwork(ip, networks4, networks6) {
272278
return null;
273279
}
274280
}
275-
function FindProxyForURL(url, host) {
281+
function FindProxyForURL(_url, _host) {
282+
const host = _host;
276283
if (isIpAddress(host)) {
277284
const match = findMatchingNetwork(host);
278285
if(match) {
@@ -286,17 +293,23 @@ function FindProxyForURL(url, host) {
286293
}
287294
}
288295

296+
const match = domainRegexpRules.find(([regexp, value]) => regexp.test(host) );
297+
if(match)
298+
return proxyBehaviors[match[1]] || default_behavior;
299+
300+
var host_segment = host;
289301
while (true) {
290-
var action = proxyRules[host];
302+
var action = proxyRules[host_segment];
291303
if (action !== undefined) {
292304
return proxyBehaviors[action] || default_behavior;
293305
}
294-
var nextDot = host.indexOf(".");
306+
var nextDot = host_segment.indexOf(".");
295307
if (nextDot === -1) {
296308
break;
297309
}
298-
host = host.substring(nextDot + 1);
310+
host_segment = host_segment.substring(nextDot + 1);
299311
}
312+
300313
var remote_ip = undefined;
301314
if(typeof dnsResolveEx == 'function') {
302315
remote_ip = dnsResolveEx(host);
@@ -355,6 +368,7 @@ if (typeof process !== 'undefined' && process.argv.includes('test')) {
355368
assertDirectHost("127.3.4.5");
356369
assertDirectHost("114.114.114.114");
357370
assertBlockedHost("www.whitehouse.com");
371+
assertBlockedHost("adservice.google.com.xx")
358372
}
359373

360374
runTests();

0 commit comments

Comments
 (0)