Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values-only -a option for mlr grep #1305

Merged
merged 2 commits into from
Jun 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -769,6 +769,13 @@ can split it into several files, one for each distinct `id`. See the [section
on tee statements](reference-dsl-output-statements.md#tee-statements) for an
example.

## terminals

These include `mlr help`, `mlr regtest`, `mlr repl`, and `mlr version`. They
aren't verbs but they can be preceded by various command-line flags. They're in
contrast to [auxents](#auxents) which are effectively standalone programs
packaged with Miller.

## terminator

Used in two senses:
Expand Down
39 changes: 19 additions & 20 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ MILLER(1) MILLER(1)
mlr help file-formats
Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
Expand Down Expand Up @@ -173,6 +174,7 @@ MILLER(1) MILLER(1)
mlr help keyword
Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
Expand Down Expand Up @@ -801,16 +803,12 @@ MILLER(1) MILLER(1)
--rs {string} Specify RS for input and output.

1mAUXILIARY COMMANDS0m
Available subcommands:
aux-list
hex
lecat
termcvt
unhex
help
regtest
repl
version
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex
For more information, please invoke mlr {subcommand} --help.

1mMLRRC0m
Expand Down Expand Up @@ -1203,17 +1201,18 @@ MILLER(1) MILLER(1)
Options:
-i Use case-insensitive search.
-v Invert: pass through records which do not match the regex.
-a Only grep for values, not keys and values.
-h|--help Show this message.
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3359,5 +3358,5 @@ MILLER(1) MILLER(1)



2023-05-13 MILLER(1)
2023-06-03 MILLER(1)
</pre>
39 changes: 19 additions & 20 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ MILLER(1) MILLER(1)
mlr help file-formats
Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
Expand Down Expand Up @@ -152,6 +153,7 @@ MILLER(1) MILLER(1)
mlr help keyword
Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
Expand Down Expand Up @@ -780,16 +782,12 @@ MILLER(1) MILLER(1)
--rs {string} Specify RS for input and output.

1mAUXILIARY COMMANDS0m
Available subcommands:
aux-list
hex
lecat
termcvt
unhex
help
regtest
repl
version
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex
For more information, please invoke mlr {subcommand} --help.

1mMLRRC0m
Expand Down Expand Up @@ -1182,17 +1180,18 @@ MILLER(1) MILLER(1)
Options:
-i Use case-insensitive search.
-v Invert: pass through records which do not match the regex.
-a Only grep for values, not keys and values.
-h|--help Show this message.
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3338,4 +3337,4 @@ MILLER(1) MILLER(1)



2023-05-13 MILLER(1)
2023-06-03 MILLER(1)
2 changes: 2 additions & 0 deletions docs/src/online-help.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Essentials:
mlr help file-formats
Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
Expand Down Expand Up @@ -81,6 +82,7 @@ Keywords:
mlr help keyword
Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
Expand Down
16 changes: 6 additions & 10 deletions docs/src/reference-main-auxiliary-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,12 @@ There are a few nearly-standalone programs which have a little to do with the re
<b>mlr aux-list</b>
</pre>
<pre class="pre-non-highlight-in-pair">
Available subcommands:
aux-list
hex
lecat
termcvt
unhex
help
regtest
repl
version
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex
For more information, please invoke mlr {subcommand} --help.
</pre>

Expand Down
19 changes: 10 additions & 9 deletions docs/src/reference-verbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -1325,17 +1325,18 @@ Passes through records which match the regular expression.
Options:
-i Use case-insensitive search.
-v Invert: pass through records which do not match the regex.
-a Only grep for values, not keys and values.
-h|--help Show this message.
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
</pre>

Expand Down
11 changes: 11 additions & 0 deletions internal/pkg/mlrval/mlrmap_print.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,17 @@ func (mlrmap *Mlrmap) ToDKVPString() string {
return buffer.String()
}

func (mlrmap *Mlrmap) ToNIDXString() string {
var buffer bytes.Buffer // stdio is non-buffered in Go, so buffer for ~5x speed increase
for pe := mlrmap.Head; pe != nil; pe = pe.Next {
buffer.WriteString(pe.Value.String())
if pe.Next != nil {
buffer.WriteString(",")
}
}
return buffer.String()
}

// ----------------------------------------------------------------
// Must have non-pointer receiver in order to implement the fmt.Stringer
// interface to make mlrmap printable via fmt.Println et al.
Expand Down
42 changes: 28 additions & 14 deletions internal/pkg/transformers/grep.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,19 @@ func transformerGrepUsage(
fmt.Fprint(o, "Options:\n")
fmt.Fprint(o, "-i Use case-insensitive search.\n")
fmt.Fprint(o, "-v Invert: pass through records which do not match the regex.\n")
fmt.Fprint(o, "-a Only grep for values, not keys and values.\n")
fmt.Fprintf(o, "-h|--help Show this message.\n")

fmt.Fprintf(o, `Note that "%s filter" is more powerful, but requires you to know field names.
By contrast, "%s grep" allows you to regex-match the entire record. It does
this by formatting each record in memory as DKVP, using command-line-specified
ORS/OFS/OPS, and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you
have CSV with header line "x,y,z" and data line "1,2,3" then the regex will
be matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
By contrast, "%s grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
"%s --odkvp ... | grep ... | %s --idkvp ..."
`, "mlr", "mlr", "mlr", "mlr")
}
Expand All @@ -61,6 +62,7 @@ func transformerGrepParseCLI(

ignoreCase := false
invert := false
valuesOnly := false

for argi < argc /* variable increment: 1 or 2 depending on flag */ {
opt := args[argi]
Expand All @@ -82,6 +84,9 @@ func transformerGrepParseCLI(
} else if opt == "-v" {
invert = true

} else if opt == "-a" {
valuesOnly = true

} else {
transformerGrepUsage(os.Stderr)
os.Exit(1)
Expand Down Expand Up @@ -116,6 +121,7 @@ func transformerGrepParseCLI(
transformer, err := NewTransformerGrep(
regexp,
invert,
valuesOnly,
)
if err != nil {
fmt.Fprintln(os.Stderr, err)
Expand All @@ -127,17 +133,20 @@ func transformerGrepParseCLI(

// ----------------------------------------------------------------
type TransformerGrep struct {
regexp *regexp.Regexp
invert bool
regexp *regexp.Regexp
invert bool
valuesOnly bool
}

func NewTransformerGrep(
regexp *regexp.Regexp,
invert bool,
valuesOnly bool,
) (*TransformerGrep, error) {
tr := &TransformerGrep{
regexp: regexp,
invert: invert,
regexp: regexp,
invert: invert,
valuesOnly: valuesOnly,
}
return tr, nil
}
Expand All @@ -153,7 +162,12 @@ func (tr *TransformerGrep) Transform(
HandleDefaultDownstreamDone(inputDownstreamDoneChannel, outputDownstreamDoneChannel)
if !inrecAndContext.EndOfStream {
inrec := inrecAndContext.Record
inrecAsString := inrec.ToDKVPString()
var inrecAsString string
if tr.valuesOnly {
inrecAsString = inrec.ToNIDXString()
} else {
inrecAsString = inrec.ToDKVPString()
}
matches := tr.regexp.Match([]byte(inrecAsString))
if tr.invert {
if !matches {
Expand Down
Loading
Loading