Skip to content

Commit 9f6a6a4

Browse files
authored
Merge pull request #32 from timbray/topfew-2.0
kaizen: prepare for 2.0 release
2 parents d4f3f66 + f35f4e3 commit 9f6a6a4

File tree

8 files changed

+48
-46
lines changed

8 files changed

+48
-46
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Topfew is hosted in this GitHub repository
44
at `github.com/timbray/topfew` and welcomes
55
contributions.
66

7-
This is release 1.0 of Topfew, which is probably more
7+
This is release 2.0 of Topfew, which is probably more
88
or less complete. It is well-tested. Its performance
99
at processing streams can keep up with most streams
1010
and it is dramatically faster when processing files,

INSTALLING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Each Topfew [release](https://github.com/timbray/topfew/releases) comes with binaries built for both the x86 and ARM
44
flavors of Linux, MacOS, and Windows.
55

6-
Topfew comes with a Makefile which is uncomplicated. Typing `make` will create an executable named `tf`,
6+
Topfew comes with a Makefile which is uncomplicated. Typing `make` will create an executable named `topfew`,
77
created by `go build` with no options, in the `./bin` directory.
88

99
## Arch Linux

Makefile

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
.PHONY: test
22

3-
all: test tf
3+
all: test topfew
44

55
test: main.go internal/*.go
66
go test ./... && go vet ./...
77

88
# local version you can run
9-
tf:
10-
go build -o bin/tf
9+
topfew:
10+
go build -o bin/topfew
1111

1212
release: test
13-
GOOS=darwin GOARCH=arm64 go build -o tf && gzip < tf > tf-macos-arm.gz
14-
GOOS=darwin GOARCH=amd64 go build -o tf && gzip < tf > tf-macos-x86.gz
15-
GOOS=linux GOARCH=amd64 go build -o tf && gzip < tf > tf-linux-x86.gz
16-
GOOS=linux GOARCH=arm64 go build -o tf && gzip < tf > tf-linux-arm.gz
17-
GOOS=windows GOARCH=amd64 go build -o tf && zip -mq tf-windows-x86.exe.zip tf
18-
GOOS=windows GOARCH=arm64 go build -o tf && zip -mq tf-windows-arm.exe.zip tf
13+
GOOS=darwin GOARCH=arm64 go build -o topfew && gzip < topfew>topfew-macos-arm.gz
14+
GOOS=darwin GOARCH=amd64 go build -o topfew && gzip < topfew>topfew-macos-x86.gz
15+
GOOS=linux GOARCH=amd64 go build -o topfew && gzip < topfew>topfew-linux-x86.gz
16+
GOOS=linux GOARCH=arm64 go build -o topfew && gzip < topfew>topfew-linux-arm.gz
17+
GOOS=windows GOARCH=amd64 go build -o topfew && zip -mq topfew-windows-x86.exe.zip topfew
18+
GOOS=windows GOARCH=arm64 go build -o topfew && zip -mq topfew-windows-arm.exe.zip topfew

README.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,38 +8,40 @@
88

99
A program that finds and prints out the top few records in which a certain field or combination of fields occurs most frequently.
1010

11-
This is release 1.0 of Topfew.
11+
This is release 2.0 of Topfew.
1212

1313
## Examples
1414

1515
To find the IP address that most commonly hits your web site, given an Apache logfile named `access_log`.
1616

17-
`tf --fields 1 access_log`
17+
`topfew --fields 1 access_log`
1818

1919
The same effect could be achieved with
2020

2121
`awk '{print $1}' access_log | sort | uniq -c | sort -rn | head`
2222

23-
But **tf** is usually much faster.
23+
But **topfew** is usually much faster.
2424

2525
Do the same, but exclude high-traffic bots (omitting the filename).
2626

27-
`tf --fields 1 --vgrep googlebot --vgrep bingbot`
27+
`topfew --fields 1 --vgrep googlebot --vgrep bingbot`
2828

2929
Most popular IP addresses from May 2020.
3030

31-
`tf --fields 1 -grep '\[../May/2020'`
31+
`topfew --fields 1 -grep '\[../May/2020'`
3232

3333
Most popular hour/minute of the day for retrievals.
3434

35-
`tf --fields 4 --sed "\\[" "" --sed '^[^:]*:' '' --sed ':..$' ''`
35+
`topfew --fields 4 --sed "\\[" "" --sed '^[^:]*:' '' --sed ':..$' ''`
3636

3737
## Usage
3838

3939
```shell
40-
tf
40+
topfew
4141
-n, --number (output line count) [default is 10]
4242
-f, --fields (field list) [default is the whole record]
43+
-q, --quotedfields [respect "-delimited space-separated fields]
44+
-p, --fieldseparator (regexp) [use provided regexp to separate fields]
4345
-g, --grep (regexp) [may repeat, default is accept all]
4446
-v, --vgrep (regexp) [may repeat, default is reject none]
4547
-s, --sed (regexp) (replacement) [may repeat, default is no changes]
@@ -48,7 +50,7 @@ tf
4850
-h, -help, --help
4951
filename [default is stdin]
5052
51-
All the arguments are optional; if none are provided, tf will read records
53+
All the arguments are optional; if none are provided, topfew will read records
5254
from the standard input and list the 10 which occur most often.
5355
```
5456
## Options
@@ -63,7 +65,7 @@ Specifies which fields should be extracted from incoming records and used in com
6365
The fieldlist must be a comma‐separated list of integers identifying field numbers, which start at one, for example 3 and 2,5,6.
6466
The fields must be provided in order, so 3,1,7 is an error.
6567
66-
If no fieldlist is provided, **tf** treats the whole input record as a single field.
68+
If no fieldlist is provided, **topfew** treats the whole input record as a single field.
6769
6870
`-p separator, --fieldseparator separator`
6971
@@ -74,13 +76,13 @@ This is likely to incur a significant performance cost.
7476
7577
Some files, for example Apache httpd logs, use space-separation but also
7678
allow spaces within fields which are delimited by `"`. The -q/--quotedfields
77-
argument allows **tf** to process these correctly. It is an error to specify both
79+
argument allows **topfew** to process these correctly. It is an error to specify both
7880
-p and -q.
7981
8082
`-g regexp`, `--grep regexp`
8183
8284
The initial **g** suggests `grep`.
83-
This option applies the provided regular expression to each record as it is read and if the regexp does not match the record, **tf** bypasses it.
85+
This option applies the provided regular expression to each record as it is read and if the regexp does not match the record, **topfew** bypasses it.
8486
8587
This option can be provided multiple times; the provided regular expressions will be applied in the order they appear on the command line.
8688
@@ -101,19 +103,19 @@ This option can be provided many times, and the replacement operations are perf
101103
`--sample`
102104
103105
It can be tricky to get the regular expressions in the `−g`, `−v`, and `−s` options right.
104-
Specifying `-−sample` causes **tf** to print lines to the standard output that display the filtering and field‐editing logic.
106+
Specifying `-−sample` causes **topfew** to print lines to the standard output that display the filtering and field‐editing logic.
105107
It can only be used when processing standard input, not a file.
106108
107109
`-w integer`, `--width integer`
108110
109-
If a file name is specified then **tf**, rather than reading it from end to end, will divide it into segments and process it in multiple parallel threads.
111+
If a file name is specified then **topfew**, rather than reading it from end to end, will divide it into segments and process it in multiple parallel threads.
110112
The optimal number of threads depends in a complicated way on how many cores your CPU has what kind of cores they are, and the storage architecture.
111113
112114
The default is the result of the Go `runtime.NumCPU()` calls and often produces good results.
113115
114116
`-h`, `-help`, `--help`
115117
116-
Describes the function and options of **tf**.
118+
Describes the function and options of **topfew**.
117119
118120
## Records and fields
119121
@@ -142,10 +144,10 @@ summarizing the request and its result, is delimited by quote characters `"`.
142144
143145
The fetch of `picInfo.xml` signals that this is an actual browser request, likely signifying that
144146
a human was involved; the URL following the `o=` is the resource the human looked at. Here is a
145-
**tf** invocation that yields a list of the top 5 URLs that were fetched by a human:
147+
**topfew** invocation that yields a list of the top 5 URLs that were fetched by a human:
146148
147149
```shell
148-
tf -g picInfo.xml -f 6 -q -s '\?utm.*' '' -s " HTTP/..." "" -s "GET .*\/ongoing" ""
150+
topfew -g picInfo.xml -f 6 -q -s '\?utm.*' '' -s " HTTP/..." "" -s "GET .*\/ongoing" ""
149151
```
150152
151153
Note the `-g` to select only lines with `picInfo.xml`, the `-q` to request correct processing
@@ -160,8 +162,8 @@ Therefore, the observed effects of combinations of options can vary dramatically
160162
For example, if I want to list the top records containing the string `example` from a file named `big-file` I could do either of the following:
161163
162164
```shell
163-
tf -g example big-file
164-
grep example big-file | tf
165+
topfew -g example big-file
166+
grep example big-file |topfew
165167
```
166168
167169
When I benchmark topfew on a modern Apple-Silicon Mac and an elderly spinning-rust Linux VPS, I observe that the first option is faster on Mac, the second on Linux.

doc/tf.1

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ A program that finds and prints out the top few records in which a certain field
55
.PP
66
To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&.
77
.PP
8-
\fB\fCtf \-\-fields 1 access_log\fR
8+
\fB\fCtopfew\-\-fields 1 access_log\fR
99
.PP
1010
The same effect could be achieved with
1111
.PP
@@ -15,20 +15,20 @@ But \fBtf\fP is usually much faster.
1515
.PP
1616
Do the same, but exclude high\-traffic bots (omitting the filename).
1717
.PP
18-
\fB\fCtf \-\-fields 1 \-\-vgrep googlebot \-\-vgrep bingbot\fR
18+
\fB\fCtopfew\-\-fields 1 \-\-vgrep googlebot \-\-vgrep bingbot\fR
1919
.PP
2020
Most popular IP addresses from May 2020.
2121
.PP
22-
\fB\fCtf \-\-fields 1 \-grep '\\[../May/2020'\fR
22+
\fB\fCtopfew\-\-fields 1 \-grep '\\[../May/2020'\fR
2323
.PP
2424
Most popular hour/minute of the day for retrievals.
2525
.PP
26-
\fB\fCtf \-\-fields 4 \-\-sed "\\\\[" "" \-\-sed '^[^:]*:' '' \-\-sed ':..$' ''\fR
26+
\fB\fCtopfew\-\-fields 4 \-\-sed "\\\\[" "" \-\-sed '^[^:]*:' '' \-\-sed ':..$' ''\fR
2727
.SH Usage
2828
.PP
2929
.RS
3030
.nf
31-
tf
31+
topfew
3232
\-n, \-\-number (output line count) [default is 10]
3333
\-f, \-\-fields (field list) [default is the whole record]
3434
\-g, \-\-grep (regexp) [may repeat, default is accept all]
@@ -39,7 +39,7 @@ tf
3939
\-h, \-help, \-\-help
4040
filename [default is stdin]
4141

42-
All the arguments are optional; if none are provided, tf will read records
42+
All the arguments are optional; if none are provided, topfewwill read records
4343
from the standard input and list the 10 which occur most often.
4444
.fi
4545
.RE
@@ -102,8 +102,8 @@ For example, if I want to list the top records containing the string \fB\fCexamp
102102
.PP
103103
.RS
104104
.nf
105-
tf \-g example big\-file
106-
grep example big\-file | tf
105+
topfew\-g example big\-file
106+
grep example big\-file |topfew
107107
.fi
108108
.RE
109109
.PP

internal/config.go

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -131,14 +131,14 @@ func parseFields(spec string) ([]uint, error) {
131131
}
132132

133133
const instructions = `
134-
tf (short for "topfew") finds the most common values in a line-structured input
134+
topfew finds the most common values in a line-structured input
135135
and prints the top few of them out, with their occurrence counts, in decreasing
136136
order of occurrences.
137137
138-
Usage: tf
138+
Usage:topfew
139139
-n, --number (output line count) [default is 10]
140140
-f, --fields (field list) [default is the whole record]
141-
-p, --fieldseparator (field separator regex) [default is white space]
141+
-p, --fieldseparator (field separator regex) [default is white space]
142142
-q, --quotedfields [default is false]
143143
-g, --grep (regexp) [may repeat, default is accept all]
144144
-v, --vgrep (regexp) [may repeat, default is reject none]
@@ -148,7 +148,7 @@ Usage: tf
148148
-h, -help, --help
149149
filename [default is stdin]
150150
151-
All the arguments are optional; if none are provided, tf will read records
151+
All the arguments are optional; if none are provided, topfew will read records
152152
from the standard input and list the 10 which occur most often.
153153
154154
Field list is comma-separated integers, e.g. -f 3 or --fields 1,3,7. The fields
@@ -160,7 +160,7 @@ performance.
160160
161161
Some files, for example Apache httpd logs, use space-separation but also
162162
allow spaces within fields which are quoted with ("). The -q/--quotedfields
163-
allows tf to process these correctly. It is an error to specify both
163+
allows topfew to process these correctly. It is an error to specify both
164164
-p and -q.
165165
166166
The regexp-valued fields work as follows:
@@ -171,7 +171,7 @@ The regexp-valued fields work as follows:
171171
The regexp-valued fields can be supplied multiple times; the filtering
172172
and substitution will be performed in the order supplied.
173173
174-
If the input is a named file, tf will process it in multiple parallel
174+
If the input is a named file, topfew will process it in multiple parallel
175175
threads, which can dramatically improve performance. The --width argument
176176
allows you to specify the number of threads. The default value is not always
177177
optimal; experience with particular data on a particular computer may lead

internal/segmenter_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ func TestReadSegmentFiltering(t *testing.T) {
6868
t.Error("config!")
6969
}
7070

71-
tmpName := fmt.Sprintf("/tmp/tf-%d", os.Getpid())
71+
tmpName := fmt.Sprintf("/tmp/topfew-%d", os.Getpid())
7272
tmpfile, err := os.Create(tmpName)
7373
if err != nil {
7474
t.Fatal("can't make tmpfile: " + err.Error())
@@ -90,7 +90,7 @@ func TestReadSegmentFiltering(t *testing.T) {
9090
// ErrBufferFull condition, had to create lines 80k long to execute that, so rather than clutter
9191
// up the filesystem with this junk, we create them synthetically
9292
func TestVeryLongLines(t *testing.T) {
93-
tmpName := fmt.Sprintf("/tmp/tf-%d", os.Getpid())
93+
tmpName := fmt.Sprintf("/tmp/topfew-%d", os.Getpid())
9494
tmpfile, err := os.Create(tmpName)
9595
if err != nil {
9696
t.Fatal("can't make tmpfile: " + err.Error())

main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ func main() {
1111

1212
config, err := topfew.Configure(os.Args[1:]) // skip whatever go puts in os.Args[0]
1313
if err != nil {
14-
fmt.Println("Problem (tf -h for help): " + err.Error())
14+
fmt.Println("Problem (topfew -h for help): " + err.Error())
1515
os.Exit(1)
1616
}
1717

0 commit comments

Comments
 (0)