[question/feature-request] prune trailing whitespaces at read time #231

jprobichaud · 2020-07-30T18:30:53Z

Thanks for this great tool!

I've search to see if this was possible but couldn't find it. I have many csv files where the values are padded with whitespaces (from a db-extraction process). It is possible to read these files and remove from every columns (even headers) any trailing white spaces?

BurntSushi · 2020-07-30T18:36:47Z

Nope. xsv is mostly about slicing and dicing CSV data. I've mostly avoided taking the next step to data transformation tool, since that vastly increases its scope and complexity. It's plausible though that some special cases could be considered.

jprobichaud · 2020-07-30T18:50:34Z

I see, this is understandable. Given how "simple" the interface is (each command does only one thing), I was seeing this as a simple flag to the "input" command or as new command called "transform" that would take simple transformations with a column selection options. Something like:

xsv transform [-s ColName1-ColName3 (same format as the xsv select syntax] [--tolower|toupper] [--ltrim|rtrim|trim (removing whitespaces to the left, to the right, on both side] [--format <printf_format>]

I can imagine a bunch of possible simple transformations (that wouldn't require performing calculations across a bunch of rows beforehand) that could be quite useful in general.

Output would go to stdout (or to a specified output file).

Yomguithereal · 2020-07-30T20:02:04Z

@jprobichaud one tricky thing also to consider is the operation's order. The ones you show here won't be affected by the order you run them in (I think) but others may. Then CLI prompt & design issue can arise and decisions must be taken.

jprobichaud · 2020-07-30T20:40:57Z

That's true. I think we could push this complexity back to the user: he could cascade multiple invocations of "xsv transform" if he needs that. We could easily define a set of transformation without interactions (to upper/to lower/trim) that are order invariant, these could be passed in the same invocation, but more complex ones, like a "printf" for example would need to be used alone (and thus the user gets to define the order) Another option is to perform the operations in the order they are on the command line, but that may or may not work with how the command line parser behaves, I haven't looked too deep in the code yet.

…

On Thu, Jul 30, 2020 at 4:02 PM Guillaume Plique ***@***.***> wrote: @jprobichaud <https://github.com/jprobichaud> one tricky thing also to consider is the operation's order. The ones you show here won't be affected by the order you run them in (I think) but others may. Then CLI prompt & design issue can arise and decisions must be taken. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#231 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGTL25E5RDCEOUBNI26JDLR6HGU3ANCNFSM4POLZBOA> .

Yomguithereal · 2020-07-31T08:32:30Z

Another solution could be to have some kind of DSL such as:

xsv transform trim,lower

a flag to put the transformed value in a new column would be useful also. I use a custom python CLI tool to do what your command suggest already then I often pipe it into partition or search. But obviously my python counterpart is way slower than xsv (even if I can also do fancy things as evaluating python on cells and applying fuzzy matching functions such as phonetic encoding, stemmers and such).

jprobichaud · 2020-07-31T10:04:30Z

Interesting, which python tool is that? Ultimately, the key strength of xsv is speed and simplicity, it would be sad to drop that for simple transformations. A dsl could be useful to avoid having to re-parse all the data too often, but that changes the philosophy of the tool quite a bit! Le ven. 31 juill. 2020 4 h 32 a.m., Guillaume Plique < [email protected]> a écrit :

…

Another solution could be to have some kind of DSL such as: xsv transform trim,lower a flag to put the transformed value in a new column would be useful also. I use a custom python CLI tool to do what your command suggest already then I often pipe it into partition or search. But obviously my python counterpart is way slower than xsv (even if I can also do fancy things as evaluating python on cells and applying fuzzy matching functions such as phonetic encoding, stemmers and such). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#231 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGTL25L4NJIFFTASY5LGD3R6J6S3ANCNFSM4POLZBOA> .

Yomguithereal · 2020-07-31T10:18:32Z

Interesting, which python tool is that?

It's there but it is undocumented unfortunately.

A dsl could be useful to avoid
having to re-parse all the data too often, but that changes the philosophy
of the tool quite a bit!

I agree. But if such DSL is not more complex than splitting operations by comma I think we are not stretching too far.

Yomguithereal · 2020-09-23T14:09:34Z

@jprobichaud PR #242 can do what you need if this can help you. You can use it easily by installing xsv from my prod branch if required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question/feature-request] prune trailing whitespaces at read time #231

[question/feature-request] prune trailing whitespaces at read time #231

jprobichaud commented Jul 30, 2020

BurntSushi commented Jul 30, 2020

jprobichaud commented Jul 30, 2020

Yomguithereal commented Jul 30, 2020

jprobichaud commented Jul 30, 2020 via email

Yomguithereal commented Jul 31, 2020

jprobichaud commented Jul 31, 2020 via email

Yomguithereal commented Jul 31, 2020

Yomguithereal commented Sep 23, 2020 •

edited

Loading

[question/feature-request] prune trailing whitespaces at read time #231

[question/feature-request] prune trailing whitespaces at read time #231

Comments

jprobichaud commented Jul 30, 2020

BurntSushi commented Jul 30, 2020

jprobichaud commented Jul 30, 2020

Yomguithereal commented Jul 30, 2020

jprobichaud commented Jul 30, 2020 via email

Yomguithereal commented Jul 31, 2020

jprobichaud commented Jul 31, 2020 via email

Yomguithereal commented Jul 31, 2020

Yomguithereal commented Sep 23, 2020 • edited Loading

Yomguithereal commented Sep 23, 2020 •

edited

Loading