Skip to content

Commit 94c4356

Browse files
committed
prefix analyse
1 parent 83d771e commit 94c4356

File tree

16 files changed

+528
-55
lines changed

16 files changed

+528
-55
lines changed

.vscode/launch.json

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
// 使用 IntelliSense 了解相关属性。
3+
// 悬停以查看现有属性的描述。
4+
// 欲了解更多信息,请访问: https://go.microsoft.com/fwlink/?linkid=830387
5+
"version": "0.2.0",
6+
"configurations": [
7+
{
8+
"name": "Launch Package",
9+
"type": "go",
10+
"request": "launch",
11+
"mode": "auto",
12+
"program": "${workspaceFolder}",
13+
"args": ["-c", "prefix", "-n", "20", "-o", "tree.csv", "large.rdb"]
14+
}
15+
16+
]
17+
}

README.md

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,9 @@ use `rdb` command in terminal, you can see it's manual
5050
```
5151
This is a tool to parse Redis' RDB files
5252
Options:
53-
-c command, including: json/memory/aof/bigkey/flamegraph
54-
-o output file path
55-
-n number of result, using in
53+
-c command, including: json/memory/aof/bigkey/prefix/flamegraph
54+
-o output file path, if there is no `-o` option, output to stdout
55+
-n number of result, using in command: bigkey/prefix
5656
-port listen port for flame graph web service
5757
-sep separator for flamegraph, rdb will separate key by it, default value is ":".
5858
supporting multi separators: -sep sep1 -sep sep2
@@ -69,7 +69,9 @@ parameters between '[' and ']' is optional
6969
rdb -c aof -o dump.aof dump.rdb
7070
4. get largest keys
7171
rdb -c bigkey [-o dump.aof] [-n 10] dump.rdb
72-
5. draw flamegraph
72+
5. get number and size by prefix
73+
rdb -c prefix [-n 10] [-max-depth 3] [-o prefix-report.csv] dump.rdb
74+
6. draw flamegraph
7375
rdb -c flamegraph [-port 16379] [-sep :] dump.rdb
7476
```
7577

@@ -258,13 +260,13 @@ The examples for json result:
258260

259261
RDB uses rdb encoded size to estimate redis memory usage.
260262

261-
```
263+
```bash
262264
rdb -c memory -o <output_path> <source_path>
263265
```
264266

265267
Example:
266268

267-
```
269+
```bash
268270
rdb -c memory -o mem.csv cases/memory.rdb
269271
```
270272

@@ -281,6 +283,40 @@ database,key,type,size,size_readable,element_count
281283
0,set,set,39,39B,2
282284
```
283285

286+
# Analyze By Prefix
287+
288+
If you can distinguish modules based on the prefix of the key, for example, the key of user data is `User:<uid>`, the key of Post is `Post:<postid>`, the user statistics is `Stat:User:???`, and the statistics of Post is `Stat:Post:???`.Then we can get the status of each module through prefix analysis:
289+
290+
```csv
291+
database,prefix,size,size_readable,key_count
292+
0,Post:,1170456184,1.1G,701821
293+
0,Stat:,405483812,386.7M,3759832
294+
0,Stat:Post:,291081520,277.6M,2775043
295+
0,User:,241572272,230.4M,265810
296+
0,Topic:,171146778,163.2M,694498
297+
0,Topic:Post:,163635096,156.1M,693758
298+
0,Stat:Post:View,133201208,127M,1387516
299+
0,Stat:User:,114395916,109.1M,984724
300+
0,Stat:Post:Comment:,80178504,76.5M,693758
301+
0,Stat:Post:Like:,77701688,74.1M,693768
302+
```
303+
304+
Format:
305+
306+
```bash
307+
rdb -c prefix [-n <top-n>] [-max-depth <max-depth>] -o <output_path> <source_path>
308+
```
309+
310+
- The prefix analysis results are arranged in descending order of memory space. The `-n` option can specify the number of outputs. All are output by default.
311+
312+
- `-max-depth` can limit the maximum depth of the prefix tree. In the above example, the depth of `Stat:` is 1, and the depth of `Stat:User:` and `Stat:Post:` is 2.
313+
314+
Example:
315+
316+
```bash
317+
rdb -c prefix -n -o prefix.csv cases/memory.rdb
318+
```
319+
284320
# Flame Graph
285321

286322
In many cases there is not a few very large key but lots of small keys that occupied most memory.

README_CN.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,9 +269,44 @@ database,key,type,size,size_readable,element_count
269269
0,set,set,39,39B,2
270270
```
271271

272+
# 前缀分析
273+
274+
如果您可以根据 key 的前缀区分模块,比如用户数据的 key 是 `User:<uid>`, Post 的模式是 `Post:<postid>`, 用户统计信息是 `Stat:User:???`, Post 的统计信息是 `Stat:User:???`。 那么我们可以通过前缀分析来得到各模块的情况:
275+
276+
```csv
277+
database,prefix,size,size_readable,key_count
278+
0,Post:,1170456184,1.1G,701821
279+
0,Stat:,405483812,386.7M,3759832
280+
0,Stat:Post:,291081520,277.6M,2775043
281+
0,User:,241572272,230.4M,265810
282+
0,Topic:,171146778,163.2M,694498
283+
0,Topic:Post:,163635096,156.1M,693758
284+
0,Stat:Post:View,133201208,127M,1387516
285+
0,Stat:User:,114395916,109.1M,984724
286+
0,Stat:Post:Comment:,80178504,76.5M,693758
287+
0,Stat:Post:Like:,77701688,74.1M,693768
288+
```
289+
290+
命令格式:
291+
292+
```bash
293+
rdb -c prefix [-n <top-n>] [-max-depth <max-depth>] -o <output_path> <source_path>
294+
```
295+
296+
- 前缀分析结果按照内存空间从大到小排列,`-n` 选项可以指定输出的数量。默认全部输出。
297+
298+
- `-max-depth` 可以限制前缀树的的最大深度。比如示例中 `Stat:` 的深度是1,`Stat:User:``Stat:Post:` 的深度是 2。
299+
300+
Example:
301+
302+
```bash
303+
rdb -c prefix -n -o prefix.csv cases/memory.rdb
304+
```
305+
306+
272307
# 火焰图
273308

274-
在很多时候并不是少量的大键值对占据了大部分内存,而是数量巨大的小键值对消耗了很多内存。目前市面上尚无分析工具可以有效处理这个问题。
309+
在很多时候并不是少量的大键值对占据了大部分内存,而是数量巨大的小键值对消耗了很多内存。
275310

276311
很多企业要求使用 Redis key 采用类似于 `user:basic.info:{userid}` 的命名规范,所以我们可以使用分隔符将 key 拆分并将拥有相同前缀的 key 聚合在一起。
277312

cases/tree.csv

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
database,prefix,size,size_readable,key_count
2+
0,a,424,424B,6
3+
0,ab,368,368B,5
4+
0,abb,232,232B,3

cases/tree.rdb

63 Bytes
Binary file not shown.

cases/tree2.csv

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
database,prefix,size,size_readable,key_count
2+
0,a,424,424B,6
3+
0,ab,368,368B,5
4+
0,b,64,64B,1

cmd.go

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,18 @@ package main
33
import (
44
"flag"
55
"fmt"
6-
"github.com/hdt3213/rdb/helper"
76
"os"
87
"strings"
8+
9+
"github.com/hdt3213/rdb/helper"
910
)
1011

1112
const help = `
1213
This is a tool to parse Redis' RDB files
1314
Options:
14-
-c command, including: json/memory/aof/bigkey/flamegraph
15+
-c command, including: json/memory/aof/bigkey/prefix/flamegraph
1516
-o output file path
16-
-n number of result, using in
17+
-n number of result, using in command: bigkey/prefix
1718
-port listen port for flame graph web service
1819
-sep separator for flamegraph, rdb will separate key by it, default value is ":".
1920
supporting multi separators: -sep sep1 -sep sep2
@@ -30,7 +31,9 @@ parameters between '[' and ']' is optional
3031
rdb -c aof -o dump.aof dump.rdb
3132
4. get largest keys
3233
rdb -c bigkey [-o dump.aof] [-n 10] dump.rdb
33-
5. draw flamegraph
34+
5. get number and memory size by prefix
35+
rdb -c prefix [-n 10] [-max-depth 3] [-o prefix-report.csv] dump.rdb
36+
6. draw flamegraph
3437
rdb -c flamegraph [-port 16379] [-sep :] dump.rdb
3538
`
3639

@@ -54,9 +57,12 @@ func main() {
5457
var seps separators
5558
var regexExpr string
5659
var noExpired bool
60+
var maxDepth int
61+
var err error
5762
flagSet.StringVar(&cmd, "c", "", "command for rdb: json")
5863
flagSet.StringVar(&output, "o", "", "output file path")
5964
flagSet.IntVar(&n, "n", 0, "")
65+
flagSet.IntVar(&maxDepth, "max-depth", 0, "max depth of prefix tree")
6066
flagSet.IntVar(&port, "port", 0, "listen port for web")
6167
flagSet.Var(&seps, "sep", "separator for flame graph")
6268
flagSet.StringVar(&regexExpr, "regex", "", "regex expression")
@@ -81,7 +87,19 @@ func main() {
8187
options = append(options, helper.WithNoExpiredOption())
8288
}
8389

84-
var err error
90+
var outputFile *os.File
91+
if output == "" {
92+
outputFile = os.Stdout
93+
} else {
94+
outputFile, err = os.Create(output)
95+
if err != nil {
96+
fmt.Printf("open output faild: %v", err)
97+
}
98+
defer func() {
99+
_ = outputFile.Close()
100+
}()
101+
}
102+
85103
switch cmd {
86104
case "json":
87105
err = helper.ToJsons(src, output, options...)
@@ -90,19 +108,9 @@ func main() {
90108
case "aof":
91109
err = helper.ToAOF(src, output, options)
92110
case "bigkey":
93-
if output == "" {
94-
err = helper.FindBiggestKeys(src, n, os.Stdout, options...)
95-
} else {
96-
var outputFile *os.File
97-
outputFile, err = os.Create(output)
98-
if err != nil {
99-
fmt.Printf("open output faild: %v", err)
100-
}
101-
defer func() {
102-
_ = outputFile.Close()
103-
}()
104-
err = helper.FindBiggestKeys(src, n, outputFile, options...)
105-
}
111+
err = helper.FindBiggestKeys(src, n, outputFile, options...)
112+
case "prefix":
113+
err = helper.PrefixAnalyse(src, n, maxDepth, outputFile, options...)
106114
case "flamegraph":
107115
_, err = helper.FlameGraph(src, port, seps, options...)
108116
if err != nil {

cmd_test.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,11 @@ func TestCmd(t *testing.T) {
5252
if f, _ := os.Stat("tmp/memory_regex.csv"); f == nil {
5353
t.Error("command memory failed")
5454
}
55+
os.Args = []string{"", "-c", "prefix", "-o", "tmp/tree.csv", "cases/tree.rdb"}
56+
main()
57+
if f, _ := os.Stat("tmp/tree.csv"); f == nil {
58+
t.Error("command prefix failed")
59+
}
5560

5661
// test error command line
5762
os.Args = []string{"", "-c", "json", "-o", "tmp/output", "/none/a"}

helper/bigkey.go

Lines changed: 6 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -4,37 +4,14 @@ import (
44
"encoding/csv"
55
"errors"
66
"fmt"
7+
"os"
8+
"strconv"
9+
710
"github.com/hdt3213/rdb/bytefmt"
811
"github.com/hdt3213/rdb/core"
912
"github.com/hdt3213/rdb/model"
10-
"os"
11-
"sort"
12-
"strconv"
1313
)
1414

15-
type topList struct {
16-
list []model.RedisObject
17-
capacity int
18-
}
19-
20-
func (tl *topList) add(x model.RedisObject) {
21-
index := sort.Search(len(tl.list), func(i int) bool {
22-
return tl.list[i].GetSize() <= x.GetSize()
23-
})
24-
tl.list = append(tl.list, x)
25-
copy(tl.list[index+1:], tl.list[index:])
26-
tl.list[index] = x
27-
if len(tl.list) > tl.capacity {
28-
tl.list = tl.list[:tl.capacity]
29-
}
30-
}
31-
32-
func newRedisHeap(cap int) *topList {
33-
return &topList{
34-
capacity: cap,
35-
}
36-
}
37-
3815
// FindBiggestKeys read rdb file and find the largest N keys.
3916
// The invoker owns output, FindBiggestKeys won't close it
4017
func FindBiggestKeys(rdbFilename string, topN int, output *os.File, options ...interface{}) error {
@@ -55,7 +32,7 @@ func FindBiggestKeys(rdbFilename string, topN int, output *os.File, options ...i
5532
if dec, err = wrapDecoder(dec, options...); err != nil {
5633
return err
5734
}
58-
top := newRedisHeap(topN)
35+
top := newToplist(topN)
5936
err = dec.Parse(func(object model.RedisObject) bool {
6037
top.add(object)
6138
return true
@@ -69,7 +46,8 @@ func FindBiggestKeys(rdbFilename string, topN int, output *os.File, options ...i
6946
}
7047
csvWriter := csv.NewWriter(output)
7148
defer csvWriter.Flush()
72-
for _, object := range top.list {
49+
for _, o := range top.list {
50+
object := o.(model.RedisObject)
7351
err = csvWriter.Write([]string{
7452
strconv.Itoa(object.GetDBIndex()),
7553
object.GetKey(),

helper/bigkey_test.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
package helper
22

33
import (
4-
"github.com/hdt3213/rdb/model"
54
"math/rand"
65
"os"
76
"path/filepath"
87
"sort"
98
"strconv"
109
"testing"
10+
11+
"github.com/hdt3213/rdb/model"
1112
)
1213

1314
func TestTopList(t *testing.T) {
@@ -24,7 +25,7 @@ func TestTopList(t *testing.T) {
2425
}
2526
objects = append(objects, o)
2627
}
27-
topList := newRedisHeap(topN)
28+
topList := newToplist(topN)
2829
for _, o := range objects {
2930
topList.add(o)
3031
}

0 commit comments

Comments
 (0)