Skip to content

Commit

Permalink
Add empty-key check to mlr check (#1330)
Browse files Browse the repository at this point in the history
* Add empty-key check to `mlr check`

* Update `mlr check --help`

* Update to on-line help
  • Loading branch information
johnkerl committed Jun 25, 2023
1 parent dff2206 commit 3e5c3e2
Show file tree
Hide file tree
Showing 15 changed files with 104 additions and 49 deletions.
21 changes: 12 additions & 9 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -936,8 +936,11 @@ MILLER(1) MILLER(1)

1mcheck0m
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down Expand Up @@ -1212,13 +1215,13 @@ MILLER(1) MILLER(1)
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3430,5 +3433,5 @@ MILLER(1) MILLER(1)



2023-06-24 MILLER(1)
2023-06-25 MILLER(1)
</pre>
21 changes: 12 additions & 9 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -915,8 +915,11 @@ MILLER(1) MILLER(1)

1mcheck0m
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down Expand Up @@ -1191,13 +1194,13 @@ MILLER(1) MILLER(1)
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3409,4 +3412,4 @@ MILLER(1) MILLER(1)



2023-06-24 MILLER(1)
2023-06-25 MILLER(1)
19 changes: 11 additions & 8 deletions docs/src/reference-verbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,8 +376,11 @@ n a b i x y
</pre>
<pre class="pre-non-highlight-in-pair">
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.
</pre>
Expand Down Expand Up @@ -1355,13 +1358,13 @@ Options:
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
</pre>

Expand Down
33 changes: 30 additions & 3 deletions internal/pkg/transformers/check.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,12 @@ func transformerCheckUsage(
o *os.File,
) {
fmt.Fprintf(o, "Usage: %s %s [options]\n", "mlr", verbNameCheck)
fmt.Fprintf(o, "Consumes records without printing any output.\n")
fmt.Fprintf(o, "Consumes records without printing any output,\n")
fmt.Fprintf(o, "Useful for doing a well-formatted check on input data.\n")
fmt.Fprintf(o, "with the exception that warnings are printed to stderr.\n")
fmt.Fprintf(o, "Current checks are:\n")
fmt.Fprintf(o, "* Data are parseable\n")
fmt.Fprintf(o, "* If any key is the empty string\n")
fmt.Fprintf(o, "Options:\n")
fmt.Fprintf(o, "-h|--help Show this message.\n")
}
Expand Down Expand Up @@ -79,10 +83,13 @@ func transformerCheckParseCLI(
// ----------------------------------------------------------------
type TransformerCheck struct {
// stateless
messagedReEmptyKey map[string]bool
}

func NewTransformerCheck() (*TransformerCheck, error) {
return &TransformerCheck{}, nil
return &TransformerCheck{
messagedReEmptyKey: make(map[string]bool),
}, nil
}

func (tr *TransformerCheck) Transform(
Expand All @@ -92,7 +99,27 @@ func (tr *TransformerCheck) Transform(
outputDownstreamDoneChannel chan<- bool,
) {
HandleDefaultDownstreamDone(inputDownstreamDoneChannel, outputDownstreamDoneChannel)
if inrecAndContext.EndOfStream {
if !inrecAndContext.EndOfStream {
inrec := inrecAndContext.Record
for pe := inrec.Head; pe != nil; pe = pe.Next {
if pe.Key == "" {
context := inrecAndContext.Context

// Most Miller users are CSV users. And for CSV this will be an error on
// *every* record, or none -- so let's not print this multiple times.
if tr.messagedReEmptyKey[context.FILENAME] {
continue
}

message := fmt.Sprintf(
"mlr: warning: empty-string key at filename %s record number %d",
context.FILENAME, context.NR,
)
fmt.Fprintln(os.Stderr, message)
tr.messagedReEmptyKey[context.FILENAME] = true
}
}
} else {
outputRecordsAndContexts.PushBack(inrecAndContext)
}
}
21 changes: 12 additions & 9 deletions man/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -915,8 +915,11 @@ MILLER(1) MILLER(1)

1mcheck0m
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down Expand Up @@ -1191,13 +1194,13 @@ MILLER(1) MILLER(1)
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."

1mgroup-by0m
Expand Down Expand Up @@ -3409,4 +3412,4 @@ MILLER(1) MILLER(1)



2023-06-24 MILLER(1)
2023-06-25 MILLER(1)
23 changes: 13 additions & 10 deletions man/mlr.1
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2023-06-24
.\" Date: 2023-06-25
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2023-06-24" "\ \&" "\ \&"
.TH "MILLER" "1" "2023-06-25" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1122,8 +1122,11 @@ Options:
.\}
.nf
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* If any key is the empty string
Options:
-h|--help Show this message.
.fi
Expand Down Expand Up @@ -1482,13 +1485,13 @@ Options:
Note that "mlr filter" is more powerful, but requires you to know field names.
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
command-line-specified ORS/OFS/OPS, and matching the resulting line against the
regex specified here. In particular, the regex is not applied to the input
stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the
regex will be matched, not against either of these lines, but against the DKVP
line "x=1,y=2,z=3". Furthermore, not all the options to system grep are
supported, and this command is intended to be merely a keystroke-saver. To get
all the features of system grep, you can do
OFS "," and OPS "=", and matching the resulting line against the regex specified
here. In particular, the regex is not applied to the input stream: if you have
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
matched, not against either of these lines, but against the DKVP line
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
and this command is intended to be merely a keystroke-saver. To get all the
features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
.fi
.if n \{\
Expand Down
6 changes: 5 additions & 1 deletion test/cases/cli-help/0001/expout
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,12 @@ Options:
================================================================
check
Usage: mlr check [options]
Consumes records without printing any output.
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* Data are parseable
* If any key is the empty string
Options:
-h|--help Show this message.

Expand Down
1 change: 1 addition & 0 deletions test/cases/verb-check/0001/cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mlr --csv check ${CASEDIR}/input.csv
Empty file.
Empty file.
3 changes: 3 additions & 0 deletions test/cases/verb-check/0001/input.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
a,b,c
1,2,3
4,5,6
1 change: 1 addition & 0 deletions test/cases/verb-check/0002/cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mlr --csv check ${CASEDIR}/input.csv
1 change: 1 addition & 0 deletions test/cases/verb-check/0002/experr
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mlr: warning: empty-string key at filename test/cases/verb-check/0002/input.csv record number 1
Empty file.
3 changes: 3 additions & 0 deletions test/cases/verb-check/0002/input.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
a,,c
1,2,3
4,5,6

0 comments on commit 3e5c3e2

Please sign in to comment.