Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exclude command idea #61

Open
silverwind opened this issue Mar 9, 2017 · 14 comments
Open

exclude command idea #61

silverwind opened this issue Mar 9, 2017 · 14 comments

Comments

@silverwind
Copy link

silverwind commented Mar 9, 2017

I think it would be handy to have a command to exclude matching columns from 2 csvs and print the remainder. For example:

source.csv:

id,boolean
1,yes
2,no
3,yes

exclusions.csv:

id,boolean
2,yes
4,no

syntax could be similar to join: xsv exclude <columns1> <input1> <columns2> <input2>, so

$ xsv exclude id exclusions.csv id source.csv
id,boolean
1,yes
3,yes
@BurntSushi
Copy link
Owner

I don't understand. Could you please write a specification for the command? (Pretend you're writing the end user documentation for how it's supposed to work.)

@silverwind
Copy link
Author

Maybe this is more clear. Part of this is copied from join. I switched the argument order so source comes before excludes, but I'm a bit undecided which argument order would be better (e.g. "exclude x from y" would be the natural way of saying it).

Excludes matching columns of two sets of CSV data and outputs the remaining
rows from the source data.

Matches are determined by ignoring leading and trailing whitespace. By default,
matches are done case sensitively, but this can be disabled with the --no-case
flag.

The columns arguments specify the columns to join for each input. Columns can
be referenced by name or index, starting at 1. Specify multiple columns by
separating them with a comma. Specify a range of columns with `-`. Both
columns1 and columns2 must specify exactly the same number of columns.
(See 'xsv select --help' for the full syntax.)

Usage:
    xsv exclude [options] <columns> <source> <columns> <excludes>
    xsv exclude --help

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. (i.e., They are not searched, analyzed,
                           sliced, etc.)
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)

@silverwind
Copy link
Author

silverwind commented Mar 9, 2017

This might be even clearer:

Compares two sets of CSV data and removes rows from the first set where
the value of the specified columns matches the value of the specified
columns in the second set. The remaining rows of the first set are then
output.

@BurntSushi
Copy link
Owner

@silverwind Indeed, that is much clearer, thank you. :-) When I read your first comment, I thought that you were asking for a symmetric difference but your examples showed a set difference. I now see that you want set difference. :-)

I agree that this could be useful. I'm personally unlikely to work on it, but I'd be happy to mentor it. The xsv join command is, unfortunately, one of the more complex operations though.

@silverwind
Copy link
Author

silverwind commented Mar 9, 2017

It's somewhat like grep -v -f excludes.txt source.txt on the specified columns. Another option could be to call the command match that prints rows with matching fields and add a -v option to invert the match to achieve exclude behaviour.

@ms2300
Copy link

ms2300 commented Sep 9, 2018

@BurntSushi Can I start on this? If so should I take the flag route (-v?) on the match command or create the new excludes command?

@BurntSushi
Copy link
Owner

@ms2300 Sure! It almost seems like this should be a flag on xsv join (although, not -v), but I'm not sure it quite fits. I think I like the idea of a separate exclude sub-command, but I could be convinced otherwise.

@silverwind
Copy link
Author

silverwind commented Sep 9, 2018

Just recently, I created a bash function for this purpose, might be helpful for an implementation:

linediff() {
  if [ $# -eq 0 ]; then
    echo "Usage: linediff [added|removed|unchanged] oldfile newfile"
  else
    if [[ $1 == "added" ]]; then
      cat "$2" "$2" "$3" | sort | uniq -u
    elif [[ $1 == "removed" ]]; then
      cat "$2" "$3" "$3" | sort | uniq -u
    elif [[ $1 == "unchanged" ]]; then
      cat "$2" "$3" | sort | uniq -d
    fi
  fi
}

Maybe call it xsv diff.

@ms2300
Copy link

ms2300 commented Sep 9, 2018

Ok awesome I'll get to work on it ... is there anything I should know outside of looking through xsv join? Xsv diff makes the most intuitive sense as a name to me since it's really just implementing set difference but I like the look of the old spec

@BurntSushi
Copy link
Owner

@ms2300 I actually saw someone requesting a "csv diff" tool, so diff probably isn't the best choice here. I'm not totally attached to exclude, but it doesn't seem too bad to me. The typically Unix command for something like this is comm, but I've never really liked its interface. However, we might still draw inspiration from it, I'm not sure.

@ms2300
Copy link

ms2300 commented Sep 9, 2018

I'll just go with exclude for now then and since it sounds like we're all on the same page on the underlying purpose if there's another name change it won't be a massive fix, yeah?

@BurntSushi
Copy link
Owner

@ms2300 Yup, sounds good to me!

@BurntSushi
Copy link
Owner

@ms2300 If you get stuck, please reach out to me on one of the Rust IRC channels. Email also works!

@ms2300
Copy link

ms2300 commented Sep 16, 2018

@BurntSushi I have the command wired in and some initial implementation done but I'm swamped right now and I'll be another week or two, I'll for sure reach out to you if I get stuck anywhere though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants