From 394681c4c1b5827c1cd62bac4f963065a57d3bf7 Mon Sep 17 00:00:00 2001 From: John Kerl Date: Sat, 3 Jun 2023 17:19:40 -0400 Subject: [PATCH] Values-only `-a` option for `mlr grep` (#1305) * Values-only option for `mlr grep` * Artifacts from `make dev` --- docs/src/glossary.md | 7 ++++ docs/src/manpage.md | 39 +++++++++-------- docs/src/manpage.txt | 39 +++++++++-------- docs/src/online-help.md | 2 + docs/src/reference-main-auxiliary-commands.md | 16 +++---- docs/src/reference-verbs.md | 19 +++++---- internal/pkg/mlrval/mlrmap_print.go | 11 +++++ internal/pkg/transformers/grep.go | 42 ++++++++++++------- man/manpage.txt | 39 +++++++++-------- man/mlr.1 | 41 +++++++++--------- test/cases/cli-help/0001/expout | 19 +++++---- test/cases/verb-grep/0006/cmd | 1 + test/cases/verb-grep/0006/experr | 0 test/cases/verb-grep/0006/expout | 3 ++ 14 files changed, 155 insertions(+), 123 deletions(-) create mode 100644 test/cases/verb-grep/0006/cmd create mode 100644 test/cases/verb-grep/0006/experr create mode 100644 test/cases/verb-grep/0006/expout diff --git a/docs/src/glossary.md b/docs/src/glossary.md index d71abf1d64..bb731297b7 100644 --- a/docs/src/glossary.md +++ b/docs/src/glossary.md @@ -769,6 +769,13 @@ can split it into several files, one for each distinct `id`. See the [section on tee statements](reference-dsl-output-statements.md#tee-statements) for an example. +## terminals + +These include `mlr help`, `mlr regtest`, `mlr repl`, and `mlr version`. They +aren't verbs but they can be preceded by various command-line flags. They're in +contrast to [auxents](#auxents) which are effectively standalone programs +packaged with Miller. + ## terminator Used in two senses: diff --git a/docs/src/manpage.md b/docs/src/manpage.md index 33325ce9d2..65ffe0d8c1 100644 --- a/docs/src/manpage.md +++ b/docs/src/manpage.md @@ -141,6 +141,7 @@ MILLER(1) MILLER(1) mlr help file-formats Flags: mlr help flags + mlr help flag mlr help list-separator-aliases mlr help list-separator-regex-aliases mlr help comments-in-data-flags @@ -173,6 +174,7 @@ MILLER(1) MILLER(1) mlr help keyword Other: mlr help auxents + mlr help terminals mlr help mlrrc mlr help output-colorization mlr help type-arithmetic-info @@ -801,16 +803,12 @@ MILLER(1) MILLER(1) --rs {string} Specify RS for input and output. 1mAUXILIARY COMMANDS0m - Available subcommands: - aux-list - hex - lecat - termcvt - unhex - help - regtest - repl - version + Available entries: + mlr aux-list + mlr hex + mlr lecat + mlr termcvt + mlr unhex For more information, please invoke mlr {subcommand} --help. 1mMLRRC0m @@ -1203,17 +1201,18 @@ MILLER(1) MILLER(1) Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. + -a Only grep for values, not keys and values. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. - By contrast, "mlr grep" allows you to regex-match the entire record. It does - this by formatting each record in memory as DKVP, using command-line-specified - ORS/OFS/OPS, and matching the resulting line against the regex specified - here. In particular, the regex is not applied to the input stream: if you - have CSV with header line "x,y,z" and data line "1,2,3" then the regex will - be matched, not against either of these lines, but against the DKVP line - "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, - and this command is intended to be merely a keystroke-saver. To get all the - features of system grep, you can do + By contrast, "mlr grep" allows you to regex-match the entire record. It does this + by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using + command-line-specified ORS/OFS/OPS, and matching the resulting line against the + regex specified here. In particular, the regex is not applied to the input + stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the + regex will be matched, not against either of these lines, but against the DKVP + line "x=1,y=2,z=3". Furthermore, not all the options to system grep are + supported, and this command is intended to be merely a keystroke-saver. To get + all the features of system grep, you can do "mlr --odkvp ... | grep ... | mlr --idkvp ..." 1mgroup-by0m @@ -3359,5 +3358,5 @@ MILLER(1) MILLER(1) - 2023-05-13 MILLER(1) + 2023-06-03 MILLER(1) diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt index 37d960e3d3..3d89f78db1 100644 --- a/docs/src/manpage.txt +++ b/docs/src/manpage.txt @@ -120,6 +120,7 @@ MILLER(1) MILLER(1) mlr help file-formats Flags: mlr help flags + mlr help flag mlr help list-separator-aliases mlr help list-separator-regex-aliases mlr help comments-in-data-flags @@ -152,6 +153,7 @@ MILLER(1) MILLER(1) mlr help keyword Other: mlr help auxents + mlr help terminals mlr help mlrrc mlr help output-colorization mlr help type-arithmetic-info @@ -780,16 +782,12 @@ MILLER(1) MILLER(1) --rs {string} Specify RS for input and output. 1mAUXILIARY COMMANDS0m - Available subcommands: - aux-list - hex - lecat - termcvt - unhex - help - regtest - repl - version + Available entries: + mlr aux-list + mlr hex + mlr lecat + mlr termcvt + mlr unhex For more information, please invoke mlr {subcommand} --help. 1mMLRRC0m @@ -1182,17 +1180,18 @@ MILLER(1) MILLER(1) Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. + -a Only grep for values, not keys and values. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. - By contrast, "mlr grep" allows you to regex-match the entire record. It does - this by formatting each record in memory as DKVP, using command-line-specified - ORS/OFS/OPS, and matching the resulting line against the regex specified - here. In particular, the regex is not applied to the input stream: if you - have CSV with header line "x,y,z" and data line "1,2,3" then the regex will - be matched, not against either of these lines, but against the DKVP line - "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, - and this command is intended to be merely a keystroke-saver. To get all the - features of system grep, you can do + By contrast, "mlr grep" allows you to regex-match the entire record. It does this + by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using + command-line-specified ORS/OFS/OPS, and matching the resulting line against the + regex specified here. In particular, the regex is not applied to the input + stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the + regex will be matched, not against either of these lines, but against the DKVP + line "x=1,y=2,z=3". Furthermore, not all the options to system grep are + supported, and this command is intended to be merely a keystroke-saver. To get + all the features of system grep, you can do "mlr --odkvp ... | grep ... | mlr --idkvp ..." 1mgroup-by0m @@ -3338,4 +3337,4 @@ MILLER(1) MILLER(1) - 2023-05-13 MILLER(1) + 2023-06-03 MILLER(1) diff --git a/docs/src/online-help.md b/docs/src/online-help.md index bdf05fb669..fc6cda292c 100644 --- a/docs/src/online-help.md +++ b/docs/src/online-help.md @@ -49,6 +49,7 @@ Essentials: mlr help file-formats Flags: mlr help flags + mlr help flag mlr help list-separator-aliases mlr help list-separator-regex-aliases mlr help comments-in-data-flags @@ -81,6 +82,7 @@ Keywords: mlr help keyword Other: mlr help auxents + mlr help terminals mlr help mlrrc mlr help output-colorization mlr help type-arithmetic-info diff --git a/docs/src/reference-main-auxiliary-commands.md b/docs/src/reference-main-auxiliary-commands.md index a55ce0361a..16fa67c092 100644 --- a/docs/src/reference-main-auxiliary-commands.md +++ b/docs/src/reference-main-auxiliary-commands.md @@ -22,16 +22,12 @@ There are a few nearly-standalone programs which have a little to do with the re mlr aux-list
-Available subcommands:
-  aux-list
-  hex
-  lecat
-  termcvt
-  unhex
-  help
-  regtest
-  repl
-  version
+Available entries:
+  mlr aux-list
+  mlr hex
+  mlr lecat
+  mlr termcvt
+  mlr unhex
 For more information, please invoke mlr {subcommand} --help.
 
diff --git a/docs/src/reference-verbs.md b/docs/src/reference-verbs.md index b0f1d43f54..0558f99d56 100644 --- a/docs/src/reference-verbs.md +++ b/docs/src/reference-verbs.md @@ -1325,17 +1325,18 @@ Passes through records which match the regular expression. Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. +-a Only grep for values, not keys and values. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. -By contrast, "mlr grep" allows you to regex-match the entire record. It does -this by formatting each record in memory as DKVP, using command-line-specified -ORS/OFS/OPS, and matching the resulting line against the regex specified -here. In particular, the regex is not applied to the input stream: if you -have CSV with header line "x,y,z" and data line "1,2,3" then the regex will -be matched, not against either of these lines, but against the DKVP line -"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, -and this command is intended to be merely a keystroke-saver. To get all the -features of system grep, you can do +By contrast, "mlr grep" allows you to regex-match the entire record. It does this +by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using +command-line-specified ORS/OFS/OPS, and matching the resulting line against the +regex specified here. In particular, the regex is not applied to the input +stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the +regex will be matched, not against either of these lines, but against the DKVP +line "x=1,y=2,z=3". Furthermore, not all the options to system grep are +supported, and this command is intended to be merely a keystroke-saver. To get +all the features of system grep, you can do "mlr --odkvp ... | grep ... | mlr --idkvp ..." diff --git a/internal/pkg/mlrval/mlrmap_print.go b/internal/pkg/mlrval/mlrmap_print.go index 9cfd8122a5..5ab7c2d5a6 100644 --- a/internal/pkg/mlrval/mlrmap_print.go +++ b/internal/pkg/mlrval/mlrmap_print.go @@ -28,6 +28,17 @@ func (mlrmap *Mlrmap) ToDKVPString() string { return buffer.String() } +func (mlrmap *Mlrmap) ToNIDXString() string { + var buffer bytes.Buffer // stdio is non-buffered in Go, so buffer for ~5x speed increase + for pe := mlrmap.Head; pe != nil; pe = pe.Next { + buffer.WriteString(pe.Value.String()) + if pe.Next != nil { + buffer.WriteString(",") + } + } + return buffer.String() +} + // ---------------------------------------------------------------- // Must have non-pointer receiver in order to implement the fmt.Stringer // interface to make mlrmap printable via fmt.Println et al. diff --git a/internal/pkg/transformers/grep.go b/internal/pkg/transformers/grep.go index c009e72192..6e692364a2 100644 --- a/internal/pkg/transformers/grep.go +++ b/internal/pkg/transformers/grep.go @@ -30,18 +30,19 @@ func transformerGrepUsage( fmt.Fprint(o, "Options:\n") fmt.Fprint(o, "-i Use case-insensitive search.\n") fmt.Fprint(o, "-v Invert: pass through records which do not match the regex.\n") + fmt.Fprint(o, "-a Only grep for values, not keys and values.\n") fmt.Fprintf(o, "-h|--help Show this message.\n") fmt.Fprintf(o, `Note that "%s filter" is more powerful, but requires you to know field names. -By contrast, "%s grep" allows you to regex-match the entire record. It does -this by formatting each record in memory as DKVP, using command-line-specified -ORS/OFS/OPS, and matching the resulting line against the regex specified -here. In particular, the regex is not applied to the input stream: if you -have CSV with header line "x,y,z" and data line "1,2,3" then the regex will -be matched, not against either of these lines, but against the DKVP line -"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, -and this command is intended to be merely a keystroke-saver. To get all the -features of system grep, you can do +By contrast, "%s grep" allows you to regex-match the entire record. It does this +by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using +command-line-specified ORS/OFS/OPS, and matching the resulting line against the +regex specified here. In particular, the regex is not applied to the input +stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the +regex will be matched, not against either of these lines, but against the DKVP +line "x=1,y=2,z=3". Furthermore, not all the options to system grep are +supported, and this command is intended to be merely a keystroke-saver. To get +all the features of system grep, you can do "%s --odkvp ... | grep ... | %s --idkvp ..." `, "mlr", "mlr", "mlr", "mlr") } @@ -61,6 +62,7 @@ func transformerGrepParseCLI( ignoreCase := false invert := false + valuesOnly := false for argi < argc /* variable increment: 1 or 2 depending on flag */ { opt := args[argi] @@ -82,6 +84,9 @@ func transformerGrepParseCLI( } else if opt == "-v" { invert = true + } else if opt == "-a" { + valuesOnly = true + } else { transformerGrepUsage(os.Stderr) os.Exit(1) @@ -116,6 +121,7 @@ func transformerGrepParseCLI( transformer, err := NewTransformerGrep( regexp, invert, + valuesOnly, ) if err != nil { fmt.Fprintln(os.Stderr, err) @@ -127,17 +133,20 @@ func transformerGrepParseCLI( // ---------------------------------------------------------------- type TransformerGrep struct { - regexp *regexp.Regexp - invert bool + regexp *regexp.Regexp + invert bool + valuesOnly bool } func NewTransformerGrep( regexp *regexp.Regexp, invert bool, + valuesOnly bool, ) (*TransformerGrep, error) { tr := &TransformerGrep{ - regexp: regexp, - invert: invert, + regexp: regexp, + invert: invert, + valuesOnly: valuesOnly, } return tr, nil } @@ -153,7 +162,12 @@ func (tr *TransformerGrep) Transform( HandleDefaultDownstreamDone(inputDownstreamDoneChannel, outputDownstreamDoneChannel) if !inrecAndContext.EndOfStream { inrec := inrecAndContext.Record - inrecAsString := inrec.ToDKVPString() + var inrecAsString string + if tr.valuesOnly { + inrecAsString = inrec.ToNIDXString() + } else { + inrecAsString = inrec.ToDKVPString() + } matches := tr.regexp.Match([]byte(inrecAsString)) if tr.invert { if !matches { diff --git a/man/manpage.txt b/man/manpage.txt index 37d960e3d3..3d89f78db1 100644 --- a/man/manpage.txt +++ b/man/manpage.txt @@ -120,6 +120,7 @@ MILLER(1) MILLER(1) mlr help file-formats Flags: mlr help flags + mlr help flag mlr help list-separator-aliases mlr help list-separator-regex-aliases mlr help comments-in-data-flags @@ -152,6 +153,7 @@ MILLER(1) MILLER(1) mlr help keyword Other: mlr help auxents + mlr help terminals mlr help mlrrc mlr help output-colorization mlr help type-arithmetic-info @@ -780,16 +782,12 @@ MILLER(1) MILLER(1) --rs {string} Specify RS for input and output. 1mAUXILIARY COMMANDS0m - Available subcommands: - aux-list - hex - lecat - termcvt - unhex - help - regtest - repl - version + Available entries: + mlr aux-list + mlr hex + mlr lecat + mlr termcvt + mlr unhex For more information, please invoke mlr {subcommand} --help. 1mMLRRC0m @@ -1182,17 +1180,18 @@ MILLER(1) MILLER(1) Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. + -a Only grep for values, not keys and values. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. - By contrast, "mlr grep" allows you to regex-match the entire record. It does - this by formatting each record in memory as DKVP, using command-line-specified - ORS/OFS/OPS, and matching the resulting line against the regex specified - here. In particular, the regex is not applied to the input stream: if you - have CSV with header line "x,y,z" and data line "1,2,3" then the regex will - be matched, not against either of these lines, but against the DKVP line - "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, - and this command is intended to be merely a keystroke-saver. To get all the - features of system grep, you can do + By contrast, "mlr grep" allows you to regex-match the entire record. It does this + by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using + command-line-specified ORS/OFS/OPS, and matching the resulting line against the + regex specified here. In particular, the regex is not applied to the input + stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the + regex will be matched, not against either of these lines, but against the DKVP + line "x=1,y=2,z=3". Furthermore, not all the options to system grep are + supported, and this command is intended to be merely a keystroke-saver. To get + all the features of system grep, you can do "mlr --odkvp ... | grep ... | mlr --idkvp ..." 1mgroup-by0m @@ -3338,4 +3337,4 @@ MILLER(1) MILLER(1) - 2023-05-13 MILLER(1) + 2023-06-03 MILLER(1) diff --git a/man/mlr.1 b/man/mlr.1 index 00a67f9ece..08ad3e578c 100644 --- a/man/mlr.1 +++ b/man/mlr.1 @@ -2,12 +2,12 @@ .\" Title: mlr .\" Author: [see the "AUTHOR" section] .\" Generator: ./mkman.rb -.\" Date: 2023-05-13 +.\" Date: 2023-06-03 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" -.TH "MILLER" "1" "2023-05-13" "\ \&" "\ \&" +.TH "MILLER" "1" "2023-06-03" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Portability definitions .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -155,6 +155,7 @@ Essentials: mlr help file-formats Flags: mlr help flags + mlr help flag mlr help list-separator-aliases mlr help list-separator-regex-aliases mlr help comments-in-data-flags @@ -187,6 +188,7 @@ Keywords: mlr help keyword Other: mlr help auxents + mlr help terminals mlr help mlrrc mlr help output-colorization mlr help type-arithmetic-info @@ -937,16 +939,12 @@ Notes about all other separators: .RS 0 .\} .nf -Available subcommands: - aux-list - hex - lecat - termcvt - unhex - help - regtest - repl - version +Available entries: + mlr aux-list + mlr hex + mlr lecat + mlr termcvt + mlr unhex For more information, please invoke mlr {subcommand} --help. .fi .if n \{\ @@ -1473,17 +1471,18 @@ Passes through records which match the regular expression. Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. +-a Only grep for values, not keys and values. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. -By contrast, "mlr grep" allows you to regex-match the entire record. It does -this by formatting each record in memory as DKVP, using command-line-specified -ORS/OFS/OPS, and matching the resulting line against the regex specified -here. In particular, the regex is not applied to the input stream: if you -have CSV with header line "x,y,z" and data line "1,2,3" then the regex will -be matched, not against either of these lines, but against the DKVP line -"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, -and this command is intended to be merely a keystroke-saver. To get all the -features of system grep, you can do +By contrast, "mlr grep" allows you to regex-match the entire record. It does this +by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using +command-line-specified ORS/OFS/OPS, and matching the resulting line against the +regex specified here. In particular, the regex is not applied to the input +stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the +regex will be matched, not against either of these lines, but against the DKVP +line "x=1,y=2,z=3". Furthermore, not all the options to system grep are +supported, and this command is intended to be merely a keystroke-saver. To get +all the features of system grep, you can do "mlr --odkvp ... | grep ... | mlr --idkvp ..." .fi .if n \{\ diff --git a/test/cases/cli-help/0001/expout b/test/cases/cli-help/0001/expout index 374275e772..fe29d65dbd 100644 --- a/test/cases/cli-help/0001/expout +++ b/test/cases/cli-help/0001/expout @@ -343,17 +343,18 @@ Passes through records which match the regular expression. Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. +-a Only grep for values, not keys and values. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. -By contrast, "mlr grep" allows you to regex-match the entire record. It does -this by formatting each record in memory as DKVP, using command-line-specified -ORS/OFS/OPS, and matching the resulting line against the regex specified -here. In particular, the regex is not applied to the input stream: if you -have CSV with header line "x,y,z" and data line "1,2,3" then the regex will -be matched, not against either of these lines, but against the DKVP line -"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, -and this command is intended to be merely a keystroke-saver. To get all the -features of system grep, you can do +By contrast, "mlr grep" allows you to regex-match the entire record. It does this +by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using +command-line-specified ORS/OFS/OPS, and matching the resulting line against the +regex specified here. In particular, the regex is not applied to the input +stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the +regex will be matched, not against either of these lines, but against the DKVP +line "x=1,y=2,z=3". Furthermore, not all the options to system grep are +supported, and this command is intended to be merely a keystroke-saver. To get +all the features of system grep, you can do "mlr --odkvp ... | grep ... | mlr --idkvp ..." ================================================================ diff --git a/test/cases/verb-grep/0006/cmd b/test/cases/verb-grep/0006/cmd new file mode 100644 index 0000000000..a6abade76a --- /dev/null +++ b/test/cases/verb-grep/0006/cmd @@ -0,0 +1 @@ +mlr --opprint --from test/input/s.dkvp grep -a y diff --git a/test/cases/verb-grep/0006/experr b/test/cases/verb-grep/0006/experr new file mode 100644 index 0000000000..e69de29bb2 diff --git a/test/cases/verb-grep/0006/expout b/test/cases/verb-grep/0006/expout new file mode 100644 index 0000000000..d1c34e5620 --- /dev/null +++ b/test/cases/verb-grep/0006/expout @@ -0,0 +1,3 @@ +a b i x y +wye wye 3 0.20460331 0.33831853 +eks wye 4 0.38139939 0.13418874