Skip to content

Commit

Permalink
Values-only -a option for mlr grep (#1305)
Browse files Browse the repository at this point in the history
* Values-only option for `mlr grep`

* Artifacts from `make dev`
  • Loading branch information
johnkerl authored Jun 3, 2023
1 parent 9f9f630 commit 394681c
Show file tree
Hide file tree
Showing 14 changed files with 155 additions and 123 deletions.
7 changes: 7 additions & 0 deletions docs/src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -769,6 +769,13 @@ can split it into several files, one for each distinct `id`. See the [section
on tee statements](reference-dsl-output-statements.md#tee-statements) for an
example.

## terminals

These include `mlr help`, `mlr regtest`, `mlr repl`, and `mlr version`. They
aren't verbs but they can be preceded by various command-line flags. They're in
contrast to [auxents](#auxents) which are effectively standalone programs
packaged with Miller.

## terminator

Used in two senses:
Expand Down
39 changes: 19 additions & 20 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ MILLER(1) MILLER(1)
mlr help file-formats
Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
Expand Down Expand Up @@ -173,6 +174,7 @@ MILLER(1) MILLER(1)
mlr help keyword
Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
Expand Down Expand Up @@ -801,16 +803,12 @@ MILLER(1) MILLER(1)
--rs {string} Specify RS for input and output.

1mAUXILIARY COMMANDS0m
Available subcommands:
aux-list
hex
lecat
termcvt
unhex
help
regtest
repl
version
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex
For more information, please invoke mlr {subcommand} --help.

1mMLRRC0m
Expand Down Expand Up @@ -1203,17 +1201,18 @@ MILLER(1) MILLER(1)
Options:
-i Use case-insensitive search.
-v Invert: pass through records which do not match the regex.
-a Only grep for values, not keys and values.
-h|--help Show this message.
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3359,5 +3358,5 @@ MILLER(1) MILLER(1)



2023-05-13 MILLER(1)
2023-06-03 MILLER(1)
</pre>
39 changes: 19 additions & 20 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ MILLER(1) MILLER(1)
mlr help file-formats
Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
Expand Down Expand Up @@ -152,6 +153,7 @@ MILLER(1) MILLER(1)
mlr help keyword
Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
Expand Down Expand Up @@ -780,16 +782,12 @@ MILLER(1) MILLER(1)
--rs {string} Specify RS for input and output.

1mAUXILIARY COMMANDS0m
Available subcommands:
aux-list
hex
lecat
termcvt
unhex
help
regtest
repl
version
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex
For more information, please invoke mlr {subcommand} --help.

1mMLRRC0m
Expand Down Expand Up @@ -1182,17 +1180,18 @@ MILLER(1) MILLER(1)
Options:
-i Use case-insensitive search.
-v Invert: pass through records which do not match the regex.
-a Only grep for values, not keys and values.
-h|--help Show this message.
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3338,4 +3337,4 @@ MILLER(1) MILLER(1)



2023-05-13 MILLER(1)
2023-06-03 MILLER(1)
2 changes: 2 additions & 0 deletions docs/src/online-help.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Essentials:
mlr help file-formats
Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
Expand Down Expand Up @@ -81,6 +82,7 @@ Keywords:
mlr help keyword
Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
Expand Down
16 changes: 6 additions & 10 deletions docs/src/reference-main-auxiliary-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,12 @@ There are a few nearly-standalone programs which have a little to do with the re
<b>mlr aux-list</b>
</pre>
<pre class="pre-non-highlight-in-pair">
Available subcommands:
aux-list
hex
lecat
termcvt
unhex
help
regtest
repl
version
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex
For more information, please invoke mlr {subcommand} --help.
</pre>

Expand Down
19 changes: 10 additions & 9 deletions docs/src/reference-verbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -1325,17 +1325,18 @@ Passes through records which match the regular expression.
Options:
-i Use case-insensitive search.
-v Invert: pass through records which do not match the regex.
-a Only grep for values, not keys and values.
-h|--help Show this message.
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
</pre>

Expand Down
11 changes: 11 additions & 0 deletions internal/pkg/mlrval/mlrmap_print.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,17 @@ func (mlrmap *Mlrmap) ToDKVPString() string {
return buffer.String()
}

func (mlrmap *Mlrmap) ToNIDXString() string {
var buffer bytes.Buffer // stdio is non-buffered in Go, so buffer for ~5x speed increase
for pe := mlrmap.Head; pe != nil; pe = pe.Next {
buffer.WriteString(pe.Value.String())
if pe.Next != nil {
buffer.WriteString(",")
}
}
return buffer.String()
}

// ----------------------------------------------------------------
// Must have non-pointer receiver in order to implement the fmt.Stringer
// interface to make mlrmap printable via fmt.Println et al.
Expand Down
42 changes: 28 additions & 14 deletions internal/pkg/transformers/grep.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,19 @@ func transformerGrepUsage(
fmt.Fprint(o, "Options:\n")
fmt.Fprint(o, "-i Use case-insensitive search.\n")
fmt.Fprint(o, "-v Invert: pass through records which do not match the regex.\n")
fmt.Fprint(o, "-a Only grep for values, not keys and values.\n")
fmt.Fprintf(o, "-h|--help Show this message.\n")

fmt.Fprintf(o, `Note that "%s filter" is more powerful, but requires you to know field names.
By contrast, "%s grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "%s grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"%s --odkvp ... | grep ... | %s --idkvp ..."
`, "mlr", "mlr", "mlr", "mlr")
}
Expand All @@ -61,6 +62,7 @@ func transformerGrepParseCLI(

ignoreCase := false
invert := false
valuesOnly := false

for argi < argc /* variable increment: 1 or 2 depending on flag */ {
opt := args[argi]
Expand All @@ -82,6 +84,9 @@ func transformerGrepParseCLI(
} else if opt == "-v" {
invert = true

} else if opt == "-a" {
valuesOnly = true

} else {
transformerGrepUsage(os.Stderr)
os.Exit(1)
Expand Down Expand Up @@ -116,6 +121,7 @@ func transformerGrepParseCLI(
transformer, err := NewTransformerGrep(
regexp,
invert,
valuesOnly,
)
if err != nil {
fmt.Fprintln(os.Stderr, err)
Expand All @@ -127,17 +133,20 @@ func transformerGrepParseCLI(

// ----------------------------------------------------------------
type TransformerGrep struct {
regexp *regexp.Regexp
invert bool
regexp *regexp.Regexp
invert bool
valuesOnly bool
}

func NewTransformerGrep(
regexp *regexp.Regexp,
invert bool,
valuesOnly bool,
) (*TransformerGrep, error) {
tr := &TransformerGrep{
regexp: regexp,
invert: invert,
regexp: regexp,
invert: invert,
valuesOnly: valuesOnly,
}
return tr, nil
}
Expand All @@ -153,7 +162,12 @@ func (tr *TransformerGrep) Transform(
HandleDefaultDownstreamDone(inputDownstreamDoneChannel, outputDownstreamDoneChannel)
if !inrecAndContext.EndOfStream {
inrec := inrecAndContext.Record
inrecAsString := inrec.ToDKVPString()
var inrecAsString string
if tr.valuesOnly {
inrecAsString = inrec.ToNIDXString()
} else {
inrecAsString = inrec.ToDKVPString()
}
matches := tr.regexp.Match([]byte(inrecAsString))
if tr.invert {
if !matches {
Expand Down
Loading

0 comments on commit 394681c

Please sign in to comment.