Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs:How do I replay cmdlogs on arbitrary files? #1606

Open
iainelder opened this issue Nov 18, 2022 · 13 comments
Open

Docs:How do I replay cmdlogs on arbitrary files? #1606

iainelder opened this issue Nov 18, 2022 · 13 comments

Comments

@iainelder
Copy link

According to #572 this has been implemented, but from the comments on the issue I can't figure out how to do it, and I can't find any documentation.

I recorded some actions an Visidata wrote me a cmdlog file whose first few lines look like this:

#!vd -p
{"col": "", "row": "", "longname": "open-file", "input": "/home/isme/tmp/stack_instances.json", "keystrokes": "o"}
{"sheet": "stack_instances", "col": "", "row": "0", "longname": "open-row", "input": "", "keystrokes": "Enter", "comment": "open current row with sheet-specific dive"}
{"sheet": "stack_instances[]", "col": "", "row": "\u30adSummaries", "longname": "open-row", "input": "", "keystrokes": "Enter", "comment": "open current row with sheet-specific dive"}
{"sheet": "stack_instances[]_Summaries", "col": "StackSetId", "row": "", "longname": "hide-col", "input": "", "keystrokes": "-", "comment": "Hide current column"}

It works only on the file names in the "open-file" command. I want it to work on any file with the same schema. I want to specify the file name at the command line.

I tried removing the "open-file" line and passing the -o option, but that produces no output at all.

vd -o ~/tmp/stack_instances.json -p stack_instance_status_summary_histogram.vdj

The other issue talks about replacing a sheet name with null or 0, but I don't know what that means. In my example the sheet name is not always the same because some of the commands create new sheets.

@frosencrantz
Copy link
Contributor

I think there is a better example, but here is what I found after a speedy search:
#1389 (comment)

on the command line say input=sheetname in your vdj file use "{input}" as sheet name, and VisiData will do the substitution. I haven't verified this with a vdj file.

@iainelder
Copy link
Author

@frosencrantz, thanks for the suggestion. Sorry for taking so long to get back to you on this.

For my use case the solution is:

  • Remove the open-file command from the cmdlog
  • Use a blank sheet name for all operations
  • Pass the input file at the command line

Here are some silly simple files for testing.

abc1 is a three-column CSV of all 1s. abc2 is has the same shape with all 2s. The cmdlog is created by opening abc1, taking a frequency table, and transposing it.

$ cat /tmp/abc1.csv
a,b,c
1,1,1
1,1,1
1,1,1

$ cat /tmp/abc2.csv
a,b,c
2,2,2
2,2,2
2,2,2

$ cat /tmp/cmdlog.vdj
#!vd -p
{"longname": "open-file", "input": "/tmp/abc1.csv", "keystrokes": "o"}
{"sheet": "abc1", "col": "a", "row": "", "longname": "freq-col", "input": "", "keystrokes": "Shift+F", "comment": "open Frequency Table grouped on current column, with aggregations of other columns"}
{"sheet": "abc1_a_freq", "col": "", "row": "", "longname": "transpose", "input": "", "comment": "open new sheet with rows and columns transposed"}

In the following tests I use the -b option for batch mode (non-interactive) processing and the -o - option to write the results to standard output.

The -p option is necessary to play the cmdlog and the anonymous parameter is for the input file.

When I replay the cmdlog without any file inputs I get the expected result.

$ vd -b -p /tmp/cmdlog.vdj -o -
opening /tmp/cmdlog.vdj as vdj
"/tmp/abc1.csv"
opening /tmp/abc1.csv as csv
open Frequency Table grouped on current column, with aggregations of other columns
open new sheet with rows and columns transposed
replay complete
saving 1 sheets to - as tsv
a	1
count	[3] [3] 1; 1; 1; [3] 1; 1; 1; [3] 1; 1; 1
percent	100.0
histogram	**************************************************
- save finished

Same behavior when I specify abc1 explicitly as input.

$ vd -b -p /tmp/cmdlog.vdj /tmp/abc1.csv -o -
opening /tmp/abc1.csv as csv
opening /tmp/cmdlog.vdj as vdj
"/tmp/abc1.csv"
opening /tmp/abc1.csv as csv
more than one sheet named "abc1"
open Frequency Table grouped on current column, with aggregations of other columns
open new sheet with rows and columns transposed
replay complete
saving 1 sheets to - as tsv
a	1
count	[3] [3] 1; 1; 1; [3] 1; 1; 1; [3] 1; 1; 1
percent	100.0
histogram	**************************************************
- save finished

When I specify abc2 as input, the cmdlog still acts on abc1.

$ vd -b -p /tmp/cmdlog.vdj /tmp/abc2.csv -o -
opening /tmp/abc2.csv as csv
opening /tmp/cmdlog.vdj as vdj
"/tmp/abc1.csv"
opening /tmp/abc1.csv as csv
open Frequency Table grouped on current column, with aggregations of other columns
open new sheet with rows and columns transposed
replay complete
saving 1 sheets to - as tsv
a	1
count	[3] [3] 1; 1; 1; [3] 1; 1; 1; [3] 1; 1; 1
percent	100.0
histogram	**************************************************
- save finished

I want the cmdlog to operate on whatever file is specified as input.

I try to acheive that by just removing the open-file command.

$ cat /tmp/cmdlog.vdj 
#!vd -p
{"sheet": "abc1", "col": "a", "row": "", "longname": "freq-col", "input": "", "keystrokes": "Shift+F", "comment": "open Frequency Table grouped on current column, with aggregations of other columns"}
{"sheet": "abc1_a_freq", "col": "", "row": "", "longname": "transpose", "input": "", "comment": "open new sheet with rows and columns transposed"}

Now when I play the cmdlog without an explicit file I get an error. That is what I would expect.

$ vd -b -p /tmp/cmdlog.vdj -o -
opening /tmp/cmdlog.vdj as vdj
no sheet named abc1
replay canceled

When I specificy abc1 explicitly, it works as expected.

$ vd -b -p /tmp/cmdlog.vdj /tmp/abc1.csv -o -
opening /tmp/abc1.csv as csv
opening /tmp/cmdlog.vdj as vdj
open Frequency Table grouped on current column, with aggregations of other columns
open new sheet with rows and columns transposed
replay complete
saving 1 sheets to - as tsv
a	1
count	[3] [3] 1; 1; 1; [3] 1; 1; 1; [3] 1; 1; 1
percent	100.0
histogram	**************************************************
- save finished

When I specify abc2 explicitly, it fails. Note that it fails looking for a sheet named abc1.

$ vd -b -p /tmp/cmdlog.vdj /tmp/abc2.csv -o -
opening /tmp/abc2.csv as csv
A likely story indeed!
opening /tmp/cmdlog.vdj as vdj
no sheet named abc1
replay canceled

Just for fun, I change the sheet names starting with abc1 to abc2 in the cmdlog.

$ cat /tmp/cmdlog.vdj
#!vd -p
{"sheet": "abc2", "col": "a", "row": "", "longname": "freq-col", "input": "", "keystrokes": "Shift+F", "comment": "open Frequency Table grouped on current column, with aggregations of other columns"}
{"sheet": "abc2_a_freq", "col": "", "row": "", "longname": "transpose", "input": "", "comment": "open new sheet with rows and columns transposed"}

Now when I specify abc2 explicitly it works! This is what I expect, but I want to to work without editing the cmdlog.

$ vd -b -p /tmp/cmdlog.vdj /tmp/abc2.csv -o -
opening /tmp/abc2.csv as csv
opening /tmp/cmdlog.vdj as vdj
open Frequency Table grouped on current column, with aggregations of other columns
open new sheet with rows and columns transposed
replay complete
saving 1 sheets to - as tsv
a	2
count	[3] [3] 2; 2; 2; [3] 2; 2; 2; [3] 2; 2; 2
percent	100.0
histogram	**************************************************
- save finished

Opening a file puts a sheet on the stack. Creating a frequency table puts another sheet on the stack. Opening a file generates a sheet named after the file. A new sheet generated from an old sheet is named with the old sheet name as a prefix.

To make it work with both input files I need to leave the sheet names blank. This makes Visidata use the sheet at the top of the stack whatever its name is.

$ cat /tmp/cmdlog.vdj 
#!vd -p
{"sheet": "", "col": "a", "row": "", "longname": "freq-col", "input": "", "keystrokes": "Shift+F", "comment": "open Frequency Table grouped on current column, with aggregations of other columns"}
{"sheet": "", "col": "", "row": "", "longname": "transpose", "input": "", "comment": "open new sheet with rows and columns transposed"}

Now playing the command log without an input file causes an error. This is what I expect.

$ vd -b -p /tmp/cmdlog.vdj -o -
opening /tmp/cmdlog.vdj as vdj
no "a" column
replay canceled

When abc1 is explicit input, it works as expected.

$ vd -b -p /tmp/cmdlog.vdj /tmp/abc1.csv -o -
opening /tmp/abc1.csv as csv
opening /tmp/cmdlog.vdj as vdj
open Frequency Table grouped on current column, with aggregations of other columns
open new sheet with rows and columns transposed
replay complete
saving 1 sheets to - as tsv
a	1
count	[3] [3] 1; 1; 1; [3] 1; 1; 1; [3] 1; 1; 1
percent	100.0
histogram	**************************************************
- save finished

And it works with abc2 is explicit input!

$ vd -b -p /tmp/cmdlog.vdj /tmp/abc2.csv -o -
opening /tmp/abc2.csv as csv
opening /tmp/cmdlog.vdj as vdj
open Frequency Table grouped on current column, with aggregations of other columns
open new sheet with rows and columns transposed
replay complete
saving 1 sheets to - as tsv
a	2
count	[3] [3] 2; 2; 2; [3] 2; 2; 2; [3] 2; 2; 2
percent	100.0
Pay attention.
histogram	**************************************************
- save finished

@anjakefala
Copy link
Collaborator

This is so cool @iainelder, thanks for documenting all of this!

@iainelder
Copy link
Author

Thanks @anjakefala .

I didn't find an example of this use case in the help output or in the Introduction to Visidata or in the Getting Started docs.

Did I miss something?

If not, I think it would be helpful to include it using some more meaningful data.

The ability to replay cmdlogs for me is a compelling feature because it facilitates reproducible analysis. So does arbitrary scripting, but Visidata's display features make it easier to see what is going on step-by-step, so it makes it easier to learn about my data.

@anjakefala anjakefala reopened this Nov 29, 2022
@anjakefala anjakefala changed the title How do I replay cmdlogs on arbitrary files? Docs:How do I replay cmdlogs on arbitrary files? Nov 29, 2022
@anjakefala
Copy link
Collaborator

@iainelder I re-opened the issue so we do not forget to update the docs, you are right!

@iainelder
Copy link
Author

I see cmdlogs introduced here:

https://github.com/saulpw/visidata/blob/develop/docs/save-restore.md

Would this be a good place to extend the documentation?

Or would you add a new page to cover the use case?

@anjakefala
Copy link
Collaborator

That would be a great place!

@iainelder
Copy link
Author

The "save and replay" page already has a screencast using sample.tsv that generates a cmdlog.

How about we make a new screencast that

  • partitions the file,
  • edits the cmdlog as I showed above,
  • and runs the cmdlog on each partition?

@saulpw
Copy link
Owner

saulpw commented Nov 29, 2022

Also, @iainelder, what would be a good root solution to this? I've added the .vdx format, which only omits "commands" to change the sheet when it's different than expected, which should help at least a bit (if not actively changing sheets). But I'd love to have a stronger mechanism for redoing analysis across multiple data files.

@iainelder
Copy link
Author

iainelder commented Dec 1, 2022

@saulpw for me the first problem was that it hadn't occurred to me that opening a file would be treated as a command. I can see why such a command is useful, but I wasn't expecting it to be included in my cmdlog, at least by default.

The next problem was to learn about the stack of sheets and that sheets have autogenerated names.

I don't actively change sheets in my analyses, so the fact that sheets have names and those names have to match in the cmdlog feels like I'm being exposed to implementation details rather than just a list of my commands.

(I'm sure there are plenty of use cases for actively changing sheets, but I'm not there yet!)

I'd like to try .vdx to see whether it makes more sense for my use cases. How do I use it?

All I see is something in the CHANGELOG for version 2.10.2 but nothing in the online help.

@saulpw
Copy link
Owner

saulpw commented Dec 4, 2022

@iainelder Just save as .vdx and then replay the same.

@iainelder
Copy link
Author

@saulpw , I just had a look at the .vdx output.

The sheet names sometimes still appear in absolute form, so it doesn't avoid that problem.

Following the instructions I gave earlier, I get output like this in cmdlog.cdx:

# VisiData v2.10.2
open-file abc1.csv
sheet abc1
col a
freq-col
sheet abc1_a_freq
transpose

If the intention is to reflect the commands that I issue explicitly, then I think the sheet commands here are a problem.

When I open a file, Visidata automatically names a sheet for the data and opens it. For me that's just part of the open-file command. I don't think to issue a sheet command after opening a file.

The same is true when I look at the frequencies. The freq-col command for me already implies a new sheet, but without a particular name at this point.

Given these two above, I would have expected transpose to behave in the same way, but there is no sheet command following it!

@reagle
Copy link
Contributor

reagle commented Dec 4, 2023

@iainelder I think you mean cmdlog.vdx? Thanks for this issue and discussion, I was wondering this myself, and about the differences between .vd, .vdj, .vds., and (now) .vdx. There definitely needs to be some documentation updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants