-
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[open] handle input files that cannot seek backward #1978
Conversation
It looks like the tests are failing because the file-saving code triggers for directories since they are not seekable. I'll look into replacing |
This is really great, @midichef. I agree, that whole Path business is pretty convoluted. Thanks for diving into it and cleaning up some bits of it, at least. Let's keep thinking of ways we can simplify it, but for now these seem like definitive improvements. |
Okay, great, I squashed the changes and didn't change anything else. |
Hi @midichef! Once again, I appreciate your thorough write-up in the description. This is a pretty bad bug, and I am grateful you caught it. I have a question:
Is the intent that this PR fixes this issue? Or just the issue for |
Okay, I see. On I am going to see if I can help with not losing lines in
|
The rows dropped are weirdly always the ones between 2679 and 6776? Which is fascinating. |
@anjakefala see what bytes those correspond to. 64k-128k maybe? |
One thing I learned is that If you add |
Ah, nice catch. It looks like visidata/visidata/loaders/archive.py Line 18 in e208ca4
|
@midichef Oh wow, I did notice that, and thought it Must Be the culprit. But got myself into a weird pdb session, and was not able to confirm that it was the culprit. Indeed, it does seem to be! What happens if you get yourself into a debugging hole. |
Yeah, it seems like what
( |
I added a test so that hopefully we would catch this in the future! If things are green, I will go ahead and merge. |
Summary
Visidata drops lines from files that don't support seeking backward, like pipes, or symlinks to fifos. (v2.11 and develop)
It can be seen by using bash's
<(command)
operator for process substitution, which is implemented with pipes:vd <(seq 10000)
On my system this is missing thousands of lines, because the file-guessing code consumes them. And some of the file guessing code gives errors from trying to
seek()
.Similarly,
open_txt()
has the same problem. It consumes lines when it peeks at the first lines of the file.vd -f txt <(seq 10000)
This fix checks whether input files support seek. If they do not, it reads the input into a
RepeatFile
, so that the same Pathp
can be opened multiple times.Limitations
However, the fix will not change the behavior on Windows. This is because
seekable()
gives wrong info for pipes on Windows. CPython has an open issue for this but it seems unlikely to be fixed any time soon.Also, the way I cause the RepeatFile to be created is a messy hack:
https://github.com/midichef/visidata/blob/3c14d752b8196d5641a0147935f4fce5f81e3049/visidata/_open.py#L109
There were already a confusing number of layers of open-like functions that get called:
vd.openPath()
→open_txt()
→visidata.Path.open()
→visidata.Path._open()
→FileProgress(..., fp=self._path.open()
. Inserting another layer viap = Path(p.given, fp=p.open(mode='rb'))
is not great. I welcome any better ideas of how to do this.Additional changes
I changed
visidata.Path.open()
to give an error when trying to open in binary mode, if the path has aRepeatFile
. BecauseRepeatFile
can only hold text and can only support the kinds of seeking that text files do.This change helps head off file-guessing functions that expect a binary file. Otherwise they open the file and fail when trying to do a binary-appropriate end-relative
seek()
. This change also will prevent the error messages fromguess_*
in #1957And I changed
RepeatFile.seek(n)
to take an extra parameter, because filetype-guessing libraries expect to be able to doseek(offset, 2)
, and give an error when they cannot:I modeled the implementation after what's in
_pyio.py
https://github.com/python/cpython/blob/76c26eaca4147ba7e3e8d7379c5a828f0b512a46/Lib/_pyio.py#L2443
I also took out a check for
if not create
which is redundant, as it is checked earlier inopenPath()
.