Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for linefeed \n character in paths saved to ~/.local/share/lf/files (paths delimiter bug) #1114

Open
Andrew15-5 opened this issue Feb 19, 2023 · 35 comments

Comments

@Andrew15-5
Copy link

Some time ago I was working with ~/.local/share/lf/files file and after some edge cases I quickly figured out that right now file/dir names are stored unreliably. The easiest example of this is when there is a linefeed \n symbol in the name: it will be saved in the "file" as 2 lines, e.g., as 2 names. And you really can't have 1 delimiter, as any character can be in path (because Linux support all Unicode characters).

Right now I was working with opener.sh for cmd open $.../opener.sh and because of POSIX (I like to make as many of my scripts as portable as possible) shell I stumped into the same problem with Unicode names and delimiter (mainly because there are no arrays in POSIX shell). Luckily, I was able to find some kind of portable solution: using base64 encoder/decoder from coreutils (which is not POSIX, but a lot of distros are GNU/Linux).

The point is that using any simple and fast encoding you can greatly limit the output charset and therefore choose any delimiter you like (even the same linefeed). Since I've limited myself to mostly POSIX shellscript I have limited encoders (or I can make my own), but you are making the program in Go and I think your options are far richer than mine. Hence, the implementation of this enhancement should be pretty straightforward and not as bulky as mine, IMO.

So, basically my idea is to put encoded path echo "$unicode_path" | base64 into the same ~/.local/share/lf/files file and delimit each encoded path with a linefeed \n (same as now). And when you need to paste selected files/dirs you just use echo "$encoded_path" | base64 --decode and then process each Unicode path separately in Go (or shell or whatever).

The only downgrade will be in user-friendlyness because after encoding, the "file" will be "ciphered" for the end user. And to overcome this problem, you will have to add a note in the manual for how to get the original (decoded) data from the "file" (or at least which steps are performed to create the "file").

@ilyagr
Copy link
Collaborator

ilyagr commented Feb 19, 2023

The standard solution here is to use \0 as a separator (see for example xargs -0). However, as you pointed out, that makes it less convenient for somebody using scripts to read the file or viewing them in an editor.

I think it would not be a bad idea to change the separator to \0 in principle, but it'd be a breaking change. A less breaking change would be to have lf use 0-separated files0 and tags0 files and maintain the old files and tags files for backwards-compatibility. That would only break scripts that modify files or tags. This may or may not be a showstopper, I'm not sure.

@Andrew15-5
Copy link
Author

Andrew15-5 commented Feb 19, 2023

As I said any character can be used in the name that means including null \0. That's why right now I don't see any solution other than limiting charset with name encoding and then delimiting names with \n or \0 or , etc.

I can also see a more dirty solution: save each copied/cut file in a separate file (the more you have selected, the more temporary files have been created). That will solve the whole Unicode/delimiter problem because each character in a single file is a part of one valid path name (1 path per file).

P.S. I know about find -print0 and xargs -0 but again they are just substitution for \n delimiter. And with Unicode paths they don't go perfectly well, but for some cases it's a good little feature.

@ilyagr
Copy link
Collaborator

ilyagr commented Feb 20, 2023 via email

@Andrew15-5
Copy link
Author

Andrew15-5 commented Feb 20, 2023

Ok, now I can live happily again. I had some knowledge about this, but thankfully it was wrong.

For once, I have a screenshot of UNIX/Windows prohibited characters in file names, but I completely skipped over the "printable" part which is also included in the screenshot (NULL is not printable).

The second thing was a laboratory work in Operating Systems class. In the PDF, it is said that any character is ok for a name (except of course /). And when I was defending the assignment by answering the questions, I answered the same thing. But there was not a single work about NULL character.

But in the end it is only logical that in C all strings are ended with NULL char. So yes, I was wrong and that's wonderful.

@Andrew15-5
Copy link
Author

Let me point out the options so far:

  • change delimiter from \n to \0
  • encode each path and choose any delimiter you like
  • store each path in a separate file (now all paths are stored in a single file)

@ilyagr
Copy link
Collaborator

ilyagr commented Feb 20, 2023

I'll add one more option: if we do shell encoding of file names (as opposed to base64), that has the advantage of only breaking backwards compatibility for files with special characters in their names.

This does have the compatibility problem @Andrew15-5 mentioned in their original comment. Decoding of such filenames could be done with sh -c printf "%s" 'encoded_filename', but I don't know of a POSIX encoder. The best I know is that in bash and zsh, printf has a %q format. It would also be nice to not mangle UTF-8 letters from other languages, which could be hard to do, especially POSIXly.

I'm not sure if this is worth the fuss. If we can find a way to transition to \0 separators, that would be better.

@Andrew15-5 Andrew15-5 changed the title Add encoding to files/dirs saved in ~/.local/share/lf/files to be able to handle special names Add support for linefeed \n character in paths saved to ~/.local/share/lf/files (paths delimiter bug) Mar 3, 2023
@ilyagr
Copy link
Collaborator

ilyagr commented Mar 26, 2023

I noticed that there is a filesep option for the separator to be used in the fs and fx environment variables. Try setting it to \0. I'm not 100% sure it will work for files with \n in their name, since the files are \n-separated in ~/.local/share/lf/files.

I'm thinking that perhaps we could switch to using \0 in ~/.local/share/lf/files and have $fs be the user-friendly-by-default output with \n separator.

@gokcehan
Copy link
Owner

gokcehan commented Apr 1, 2023

This issue comes up from time to time and there was even a hot discussion about this in the past but in the end I still think it is more useful to keep things simple. Part of the problem that I haven't seen in this discussion is that environment variables can not contain \0 characters as far as I remember. We also don't have a proper way to display filenames with newlines in the tui I think. If there will remain some parts of the program that does not handle newlines properly, I don't see much gain for making a specific part of the program newline compatible. I guess the recommended way is just to handle the filenames with newlines by dropping to the shell and then complain to the person who sent you the file afterwards.

@Andrew15-5
Copy link
Author

I can agree with this, but there are things:

  1. edge cases from time to time do require (a lot of|some) work and are not necessarily very profitable (not "much gain")
  2. Ranger, as a more mature program, handles such tricky files just fine (it also tries to print \n in TUI in some cases)

Yes, apparently you can't store \0 in environment variables. But I don't understand why this is an issue if program in Go and paths are stored in a file.

@gokcehan
Copy link
Owner

gokcehan commented Apr 1, 2023

@Andrew15-5 It's not a problem when you handle things in Go but users are not using Go for customization and the information should be passed to shell somehow. For example, you can't use \0 for the filesep option. In general, the problem with these discussions is that there is never really a concrete example or usecase so we end up talking in abstract. So ideally you should propose a technical change with all the required specifications along with an example in its full and then you need to somehow argue how the proposed change will make the example move from a non-working state to a working state.

@Andrew15-5
Copy link
Author

the information should be passed to shell somehow

I'm not familiar with Go, but you should be able to create a new process by providing a list such as ["command", "arg1", "arg2", ...] (like subprocess.Popen([cmd, arg1, arg2, ...]) in Python). So it doesn't matter if a name has \n in it because it is a separate argument, and it will never be split or misused in any way. If you have a problem with passing the information to shell, then I would like to hear it (just because I'm curious).

Oh, wait. Are you talking about things like $fx in shell? Maybe I misread your comment.

so we end up talking in abstract

I see your point.

But we do still have an option to create a separate file for each path. On Linux and some other platforms (BSD), you can have up to 255 character length file names. This means that you can have 10^255 unique indexed/numbered files (which is super-duper huge). This is also not too different from the file(s) size perspective, but it can potentially create a problem in access time on HDD (I think). The "one name-one file" approach is about as user-friendly as "all names-one file" (that can't handle \n in names) if we are talking about custom shell scripts that make use of content of ~/.local/share/lf. The only concern I have is Windows, which is far less flexible and completely different from Unix-like OSes. I remember that it has a limit on the absolute path length (maybe it is 255) which in some cases completely invalidates the approach.

@joelim-work
Copy link
Collaborator

I had an idea to use filesep as the delimiter in ~/.local/share/lf/files in addition to $fs and $fx. So now the format of ~/.local/share/lf/files will be move/copy, followed by \n, followed by the paths separated using filesep.

Something like the below implementation:

diff --git a/app.go b/app.go
index 0c40d6c..c262326 100644
--- a/app.go
+++ b/app.go
@@ -113,27 +113,26 @@ func loadFiles() (list []string, cp bool, err error) {
 	}
 	defer files.Close()
 
-	s := bufio.NewScanner(files)
-
-	s.Scan()
-
-	switch s.Text() {
+	reader := bufio.NewReader(files)
+	line, _, err := reader.ReadLine()
+	switch string(line) {
 	case "copy":
 		cp = true
 	case "move":
 		cp = false
 	default:
-		err = fmt.Errorf("unexpected option to copy file(s): %s", s.Text())
+		err = fmt.Errorf("unexpected option to copy file(s): %s", string(line))
 		return
 	}
 
-	for s.Scan() && s.Text() != "" {
-		list = append(list, s.Text())
+	bytes, err := io.ReadAll(reader)
+	if err != nil {
+		err = fmt.Errorf("loading file list: %s", err)
+		return
 	}
 
-	if s.Err() != nil {
-		err = fmt.Errorf("scanning file list: %s", s.Err())
-		return
+	if len(bytes) > 0 {
+		list = strings.Split(string(bytes), gOpts.filesep)
 	}
 
 	log.Printf("loading files: %v", list)
@@ -160,10 +159,7 @@ func saveFiles(list []string, cp bool) error {
 		fmt.Fprintln(files, "move")
 	}
 
-	for _, f := range list {
-		fmt.Fprintln(files, f)
-	}
-
+	files.WriteString(strings.Join(list, gOpts.filesep))
 	files.Sync()
 	return nil
 }

Although this might end up being a breaking change since the format of ~/.local/share/lf/files is changed. This also means going down the rabbit hole of trying to support newlines in lf. I am not sure whether this kind of change is worth implementing.

@Andrew15-5
Copy link
Author

Since the project in beta/alpha, this means that breaking changes are expected.

To be a complete/mature file manager it has to support LF in file names since such file names are possible. But is definitely not a priority.

On the other hand, if this will be the last breaking change to the file structure, then it's better to change the structure ASAP, IMO. Now user base is small with time it will only grow.

@gokcehan
Copy link
Owner

@Andrew15-5 I don't know what you mean by beta/alpha but we have been avoiding breaking changes as much as possible for quite some time.

@joelim-work Wouldn't that still be the case that users can not use \0 for filesep though, as environment variables fs and fx can not include \0? Then handling filenames with newlines will still not be possible, even with a breaking change.

To be honest, this newline in filenames issues have been going around for years and at this point I don't think there is an easy solution. An alternative approach might be to proactively give warnings to users before shell commands. So maybe we can consider adding a check before running shell commands to see if the filenames contain a newline and/or the filesep character, and then refuse to run the shell command with an error when that is the case. This way, users who are worried about such cases can feel safer when running their commands.

@Andrew15-5
Copy link
Author

we have been avoiding breaking changes as much as possible for quite some time.

Really? This doesn't mean that there weren't breaking changes, because they were in every second version or so (just by looking through https://github.com/gokcehan/lf/releases). I think there was a WIP status, but not anymore. I used to the semver, so I can't really tell just by looking at version r31.

this newline in filenames issues have been going around for years and at this point I don't think there is an easy solution. An alternative approach might be to proactively give warnings to users before shell commands.

Hmm, I think we are likely to introduce a breaking change anyway, so this is almost a fact. A hard refuse on super rare special names encountered (that can make a chaos) does make me feel safer. So you are saying that some kind of switch or an additional flag must be passed every time to bypass the error? Because otherwise the file manager wouldn't be 100% file manager if it can't manage special named files just because they're named strangely.

we can consider adding a check before running shell commands

Right now (I didn't test it for a while though) when copying such a file it will take 2+ lines in the ~/.local/share/lf/files instead of just one. So the check should be run not only on shell commands, but on some of the built-in commands too, at least the ones that interact with the ~/.local/share/lf/files.

@gokcehan
Copy link
Owner

@Andrew15-5 All I'm saying is that we try to avoid breaking changes "as much as possible". In practice, it means if you have a suggestion that involves a breaking change, you also need to have an argument that is as strong as the change itself. I mentioned this because there has been previous suggestions for this issue that would break like 90% of all custom commands in existence. That is not the kind of breaking changes that we make these days. Instead, we often use the breaking label for changes that you should be aware of. For example, if we were to add a filename check for shell commands, we would likely have it listed as a breaking change in the release notes, even though it does not really break anything in practice, since it was already not working in the first place. Technically, you can always argue that every change in a software is a breaking change. In any case, you are welcome to have your own judgement on the stability of this software. In your case, I suggest you to interpret our every release as a major version in semver.

Note, I consider lf a 100% file manager since you can always call $bash or any other file manager from within lf to do anything you want. Again, you are welcome to have your own judgement on this matter.

I agree, some builtin commands like cut and copy may need to be checked as well, I'm not yet sure.

@Andrew15-5
Copy link
Author

I understand what "as much as possible" means (in this context).

can always argue that every change in a software is a breaking change

According to SemVer breaking changes is when the major number is increased. So things like new features that only add something new and don't break anything old is a non-breaking change, hence a minor number is increased.

I mean, since you are deliberately (I assume) don't use SemVer or any other kind of v1.1(.1), it would indicate that the versions are treated differently. Now I know that each version is like a major. But it's probably more like major or minor. If there were or will be hotfix versions (to fix just bugs), then it's like patch too.

I consider lf a 100% file manager since you can always call $bash or any other file manager from within lf to do anything you want.

I would agree if we are closing our eyes on the fact that a file manager should manage files with its own API/commands. I consider using other file managers or shell as cheating, because it relies on 3rd party software (including shell, instead of direct Kernel/OS API). It should be better than this! All non-built-in/auxiliary commands/API is just an enhancement to allow for greater number of (nonessential) features and for bigger flexibility.

Can you comment on this:

So you are saying that some kind of switch or an additional flag must be passed every time to bypass the error?

Because it kind of looks like you just want to add a precaution to not work with files with the \n (because it can cause error and unpredicted behavior).

@gokcehan
Copy link
Owner

@Andrew15-5 Why should there be a mechanism to override the error, if we already know beforehand that the command will fail?

@Andrew15-5
Copy link
Author

Andrew15-5 commented Nov 11, 2023

No-no. Mm, I imagine it a bit differently.

Yes, the built-in commands will fail either way. But shell script/command might not fail (shell commands can handle arguments with \n). So there is a case where someone would want to bypass the error.

I was thinking... I don't know... I have a thought, but now I don't think it's the right thing, but here it is. If the special name triggers the error, then if we (pass a magical flag to) ignore the error, then the algorithm of copying/pasting/deleting will be different. Normally in the ~/.local/share/lf/files we use \n to separate files. Then maybe after switching to the mode of handling files with \n we use other file for storing, or other separator, or as I purposed, we use encoding or something else which only reserved for such special cases. I know it is super vague, but I hope you get some parts of the idea. Or maybe we just have some kind of in-memory storage for storing file names with any names possible and therefore not creating/modifying the ~/.local/share/lf/files. So the point is that since we can't handle special names with the current algorithm/file scheme, we have to make it work some other way. Which way will be the best — I don't know — there are plenty.

@joelim-work
Copy link
Collaborator

@gokcehan @Andrew15-5 To clarify a few points in this discussion:

Wouldn't that still be the case that users can not use \0 for filesep though, as environment variables fs and fx can not include \0? Then handling filenames with newlines will still not be possible, even with a breaking change.

I forgot to mention this earlier, but theoretically you can set filesep (and consequently ifs) to some other non \0 byte which is unlikely to appear in a filename (up to the user's preference). I think this is enough to ensure that shell scripts using $fx will work, but for cut/copy/paste operations which use ~/.local/share/lf/files, after seeing the latest discussion I don't think it's a good idea to change the format. There are a number of examples in the wiki which use this file and they will break if the format of ~/.local/share/lf/files changes.

I think there was a WIP status, but not anymore. I used to the semver, so I can't really tell just by looking at version r31.

There used to be such a WIP notice in the README.md, but it was removed because a user requested it in #1395. The status of a given software project (e.g. decision to make breaking changes) is somewhat subjective, and your opinion might be different from others. For the release version numbers, SemVer was previously requested in #490, but I'm not sure whether there is any point (perhaps that issue can be closed?). Releases are somewhat infrequent (typically every 3-6 months), and during that time there are usually a few breaking changes that have accumulated.

Just because there are breaking changes in every release, that does not mean that they can always be accepted. Breaking changes are not created equally and have to be decided on a case-by-case basis. In general a breaking change is more likely to be accepted if:

  • It is absolutely necessary (e.g. for security reasons, I don't recall such a case before, just an example), or if there is a very good reason to do so.
  • It only affects a small subset of users (e.g. for a feature is which is not widely used).
  • It's impact is minor (e.g. cosmetic change).
  • The original behavior can be restored via a simple configuration.

Or maybe we just have some kind of in memory storage for storing file names with any names possible and therefore not creating/modifying the ~/.local/share/lf/files.

This isn't a problem with other file managers because they do not have the same design as lf where multiple instances need to communicate with each other. For instance, lf doesn't support tabs, and the recommendation is to instead have two instances where you can cut/copy/paste files between them. This means that the separate instances need to have some way to communicate with each other, and that is done through ~/.local/share/lf/files - in other words in-memory storage is not sufficient here.


The current format of ~/.local/share/lf/files is a simple one using line-based records, which makes it easy to use with other filter programs like grep. A natural disadvantage of this format (in its current implementation) is that it is impossible to represent filenames with newlines in them. While you can change the format to allow for newlines, the problem is that you are always making some kind of tradeoff (e.g. making the file less convenient to work with, having to introduce complexity into the code, etc.), just to benefit the few users who need to work with such files.

The more I think about this, the less attractive I find the idea of using filesep in ~/.local/share/lf/files, due to its unconventional format. Honestly the idea of using base64 encoding as suggested in the OP doesn't sound too bad, and is quite simple to implement:

diff --git a/app.go b/app.go
index 0c40d6c..4ce07b3 100644
--- a/app.go
+++ b/app.go
@@ -2,6 +2,7 @@ package main
 
 import (
 	"bufio"
+	"encoding/base64"
 	"fmt"
 	"io"
 	"log"
@@ -128,7 +129,9 @@ func loadFiles() (list []string, cp bool, err error) {
 	}
 
 	for s.Scan() && s.Text() != "" {
-		list = append(list, s.Text())
+		if f, err := base64.StdEncoding.DecodeString(s.Text()); err == nil {
+			list = append(list, string(f))
+		}
 	}
 
 	if s.Err() != nil {
@@ -161,7 +164,7 @@ func saveFiles(list []string, cp bool) error {
 	}
 
 	for _, f := range list {
-		fmt.Fprintln(files, f)
+		fmt.Fprintln(files, base64.StdEncoding.EncodeToString([]byte(f)))
 	}
 
 	files.Sync()

This behavior could be toggled using an option named base64, which means there would be no breaking change for existing users At this point I cannot guarantee that such a feature will be accepted, but at the very least hopefully the patches I have provided here will be useful for demonstrative purposes (e.g. for anyone who wants to patch their own fork of lf).

@Andrew15-5
Copy link
Author

Andrew15-5 commented Nov 12, 2023

theoretically you can set filesep (and consequently ifs) to some other non \0 byte which is unlikely to appear in a filename (up to the user's preference).

I think I already left a note about this. You already forgot that we are dealing with the very rare but totally valid case (using \n). You want to fix this by using another special character (U+0001). And here is the problem: the user can use all of the special/rare characters in the absolute path, and then the only character left will be the U+0000. It's like "fighting fire with fire" (not the kind that works). One kinda bad idea that just struck me is to use some kind of byte that will make the ~/.local/share/lf/files file an invalid UTF-8 text, meaning that... hold on, do names have to be a valid UTF-8 string? Maybe they don't have to. Scrap that.

This means that the separate instances need to have some way to communicate with each other, and that is done through ~/.local/share/lf/files - in other words in-memory storage is not sufficient here.

I thought that lf has a server and many clients. Does it only have equal instances and no server? Like, is all the communication based only on the ~/.local/share/lf/files and nothing else? Suppose there is a server (or we can make one, which is a big thing to do), then we can easily access in-memory data by talking to the server (like a Redis or its wrapper or something). And this is not the default behavior. Only if we are bypassing the error about special names. If lf instances could talk to each other (like a mesh) then there is not much of a gain to use a separate server for storing file names — they can get them directly from their peers, but there won't be any central/synchronized place.

I can think of another less ideal hack. We use a dynamic filesep that will change if some special & rare characters are already present in the file paths. So, supposing we turn on the no-no switch (when dealing with naughty names). Then filesep will be still \n if there is no \n in the file names. If there are, then it will be changed to another rare character ­— U+0001, if that is present, then to U+0002 and so on. The obvious downside is that we could eventually run out of the rare characters that all will be used in the file names, and there will be no character left for the filesep.

I want to know if filesep is specifically a byte or a Unicode character(s) that is encoded with UTF-8 or something else. This can change some things how we could solve the main problem.

Right now, I see 2 clear winners that will 100% help manage any file name:

  • encoding with base64
    Pros:
    • easy to implement
    • fast (almost no performance loss)
      Cons:
    • if user will create any kind of shell scripts that will interact with the ~/.local/share/lf/files file, they would have to have a base64 binary in the $PATH, because not everyone is using GNU core utilities package (coreutils on Debian, but it is a required one)
    • if user would want to switch to the "base64" mode, then they also might have to change their existing lf config
  • storing files in-memory
    Pros:
    • no disk IO operations (SSD will live extra week longer)
    • lighting fast (as if current approach isn't fast enough)
    • much more convenient/simple/easy API can be used (instead of reading a file with POSIX/other commands)
    • mode and file_list can be fetched with 2 different API commands (instead of splitting the same file manually)
      Cons:
    • need a central/P2P server that will hold everything in the memory
    • server have to live even if no lf instance is running (can be fixed by dumping DB to some file and then restoring it from that file)
    • if (all) the server(/clients) were to be killed, then no copied/cut files could be restored (which is unlikely and is non-destructive)

If creating a server, and its (shellscript) API won't be a huge problem, then maybe we could transition the lf from "wasting" IO resources of HDD/SSD and using not an ideal storage such is a file (which has to have \n separator at this point) to just talking to the server (or instance-to-instance) and not having any problems with names ever again. Solving this issue will introduce a breaking change anyway (as I mentioned before).

@Andrew15-5
Copy link
Author

Again, in my opinion, a file manager has to be able to manage any files with after a clean installation. If we are going to "hide" or make an opt-in option for handling files with \n then overall it is fine, but if it is not enabled by default, then it's not great. With an introduction of a new way of storing files and fetching them, this file manager will be able to handle any paths and files with no problem and no additional configuration.

It's probably "too late" or something, but a lot of things are not "too late" it's just a strong emotional connection with all the work or use that is associated with the ~/.local/share/lf/files. I'm just saying this in case someone would say something similar. I do understand that this will be a huge change, but I think it's clear that it has a lot of positives. It only has negatives in terms of implementation (and that users would have to edit their configs for a more simple API).

@DusanLesan
Copy link

I do not think it should be expected from software in any stage of development to cover all the possibilities. That is actually true for most things: people will not get vaccines for all possible infections, cars won't be tested for all possible incidents but for the most common...
New thing is added when there is enough demand/reason for it. And in this thread, I have failed to find any valid use case and only stating that case is possible.

Anyway, if anyone wants to cover new lines issue, you should also consider marks and tags

@Andrew15-5
Copy link
Author

I do not think it should be expected from software in any stage of development to cover all the possibilities. That is actually true for most things: people will not get vaccines for all possible infections, cars won't be tested for all possible incidents but for the most common...

You are overgeneralizing my statement. And non-software examples has nothing to do with software and "covering all possibilities", because in real life/physical world you can't physically simulate all cases and in some situations it is not even needed. In virtual world/software, we can emulate either all possible cases or all groups of cases, and it's called tests (unit/integration). And any good software that will be maintained and/or improved have to have tests written to cover as many things in the codebase as possible to prevent as many bugs as possible (when changing/adding code). So you should expect from software in last stage of development to cover everything that should be covered.

New thing is added when there is enough demand/reason for it.

Correct. But manipulating files is the sole purpose of a file manager. If it can't do this, then what kind of file manager is it? 99.9% file manager? Never heard of that, personally. Again, you are overgeneralizing things.

And in this thread, I have failed to find any valid use case and only stating that case is possible.

Since, it's a core functionality of any file manager the question about valid use case should exist, it just should work and that's it. The use case can be as simple (not serious) as: a friend sent you a file with the name memes.zip which you them unarchive in the same directory and see a new directory memes inside which there are a lot of memes. And you want to copy all the files inside the memes dir to your own dir with memes. Oh, wait! What's that? An error? And only after checking the ls meme* you find that there is this file:

-rw-rw-r-- 1 me me 0 Nov 12 11:11 'meme6'$'\n''9'

In lf it will be shown as: image (/path/to image)

Apparently, this mf joker created a file with newline. Why? Ask the friend who send it.

In ranger it will be shown as meme6, but you can still manipulate it (copy/move/delete). In nautilus you will actually see the full name:

image

And, of course, as expected from a file manager, you can manipulate this file too.

In lf you can only delete this file (Cast it into the fire! Destroy it!), because when you try to copy/cut it, you will see number 2 in the bottom right corner, which will indicate that you now will copy/cut 2 files, which is incorrect. And after pasting, you get an error: [2] lstat 9: no such file or directory.

Do you see any valid use case for using CRLF instead of LF at the end of line in files on Windows? I don't see any, but it's just how Windows works. Same with "handling" "files with \n" in a "file manager".

Anyway, if anyone wants to cover new lines issue, you should also consider marks and tags

Good catch. If I add dir with \n via 'Y or tag command, it will instantly crash the current lf instance and new won't start (because of a crash) until marks/tags file isn't cleared of special dir path (that uses 2 lines instead of 1 in the file).

For the correct display of \n we could use something similar to $'\n' from ls, or changing the fg/bg color and just printing \n (2 chars), or use multiple rows/lines for such a file/dir (like in Nautilus), which would the most correct way, because we are treating LF char as LF (adding new line). I actually like it.

But the main problem is still backend and storage of copied/cut files.

@DusanLesan
Copy link

I understand that it would be good to cover it, but I just disagree with the notion that it is a necessity for completeness/ok only for beta version software. While I always value the pursuit of perfection, it's important to consider the balance between effort and benefit. Especially in the context of unpaid effort in FOSS.

As for the display format, while I don't anticipate ever seeing it, I prefer to have it displayed in a single line. So maybe $'\n'

@Andrew15-5
Copy link
Author

Andrew15-5 commented Nov 12, 2023

I understand that it would be good to cover it

Good.

ok only for beta version software

What do you mean by that? Support for LF is ok/great in any version, but I'm talking about necessity to support it in the end, i.e. stable release (i.e., 1.0.0 in SemVer).

While I always value the pursuit of perfection, it's important to consider the balance between effort and benefit. Especially in the context of unpaid effort in FOSS.

If you are worried that unpaid effort to add support for LF will be greater than benefit, then I can take over the implementation. But I don't know when I will be able to do this, considering that I don't even know Go and I failed to quickly setup Go LSP (a lot of import errors), although I don't think it will take more than 2 days to learn all the basics. And as most of us, I don't have enough time. The only question is which implementation @gokcehan will choose, if any (issue is still open, therefore I assume it (LF) will be supported eventually).

As for the display format

Since our UI taste (of how to display LF) is different, we can provide an option to change between different styles:

  • add a real new line;
  • print $'\n' or just provide a string to newlinefmt of how you want to see \n being displayed.

Default can be inline variant so that file will take the same number of lines as any other file would (the way users are used to see files).

BTW, LF is for Line Feed \n character and not an alias for lf file manager name if someone thought it was otherwise. (I realized that they are the same only now.)

@DusanLesan
Copy link

DusanLesan commented Nov 12, 2023

What do you mean by that? Support for LF is ok/great in any version, but I'm talking about necessity to support it in the end, i.e. stable release (i.e., 1.0.0 in SemVer).

I mean I do not think this issue is ok to have only in beta software. I think it is ok for other stable software other than lf to miss this functionality

BTW, LF is for Line Feed \n character and not an alias for lf file manager name if someone thought it was otherwise. (I realized that they are the same only now.)

lol I thought in parts like To be a complete/mature file manager it has to support LF in file names since such file names are possible. you are intentionally doing it. And that is actually the part I disagree with because neither nautilus nor ranger mentioned before handle them well. How do you create such file in nautilus? How do you rename such files in ranger?
Why can't ranger display directories with \n?
image

Directory shown as multi here has 3 lines. File starting with multi is missing second line and another file is in the middle of it. But I do not think any of these are important to consider these FMs unstable. Also this can be used as my argument against multi line display in lf

@Andrew15-5
Copy link
Author

because neither nautilus nor ranger mentioned before handle them well. How do you create such file in nautilus?

They handle them enough to be able to copy/cut/delete them. And this is probably 90%–100% of functionality anyway. Because you can't create a file, and you don't create a file with a GUI FM (like Nautilus) — you create all files in 100s of different GUI apps. This is why I said "100%" previously. But in any FM you do create a lot of folders, but it is still nothing compared to an actual handling of files/folders, so this is why I said "90%".

How do you rename such files in ranger?

Heh, you only mentioned ranger for renaming, huh? I wonder why. /s Fortunately during renaming Nautilus does show LF character which you can copy and paste it in the rename field, and it will save it just fine (or you can copy \n using any other way). Anyway, about Ranger. Ranger has a bulkrename command with which you can edit the name just like with massren + lf, but after closing it will show you an actual shell command that will be executed to rename the file where you can add a real \n and quote the whole argument, and you will be able to rename files/dirs with LF char. And bulkrename is a built-in command that does rely on a shell command. So it's not 100% "built-in" support, but it does support it out of the box, and you can rename files if you know how Shellscript works (of course you do), so we can say that it also support renaming.

Why can't ranger display directories with \n?

IDK, ask the author. I probably would've started the same discussion there (maybe it already exists), but I don't use it anymore. I already left a note about that:

In ranger it will be shown as meme6

Also this can be used as my argument against multi line display in lf

The fact that ranger doesn't support displaying \n in any way doesn't mean that lf also have to "not support" it. These are 2 completely different projects with different codebase and main PL (programming language). They share mostly the same look and the same purpose. Like, they are more rivals at this point (but they don't really hate each other or something).

@DusanLesan
Copy link

DusanLesan commented Nov 12, 2023

Heh, you only mentioned ranger for renaming, huh? I wonder why

You've seen right through me :) Nautilus file display and renaming (as long as you can copy LF from somewhere) work great.
I did not mean to compare or anything. I only wanted to point out how software you mentioned (and that is not considered unstable) is still not having 100% working implementation while to me it looks like you expect 100% of all possible cases for software to reach full maturity. This thread is too opinionated and instead of defining what is complete/mature software, it would be better to stick to requirements and use cases.

Also this can be used as my argument against multi line display in lf

The fact that ranger doesn't support displaying \n in any way doesn't mean that lf also have to "not support" it.

I did not mean lf should not support LF, just that it should support single line representation with \n or $'\n' like you mentioned before. Displaying it in multiple lines sounds like headache. If you do not have size on the right, it would not be obvious if it is single file or more of them, and of-course that could lead to issues like in the image

@Andrew15-5
Copy link
Author

Andrew15-5 commented Nov 12, 2023

I only wanted to point out how software you mentioned (and that is not considered unstable) is still not having 100% working implementation

You got me there. I threw them into discussion as examples only because I used them for some time and don't remember using anything else. I mean, I did use Dolphin, and it probably works very similar to Nautilus, but I don't have it installed, and I didn't use it as much. I could go out of my way and search for a better collection of stable FMs that do fully support LF char, but I'm too busy (and lazy).

I did not mean lf should not support LF

Wonderful!

Displaying it in multiple lines sounds like headache.

It only sounds like a headache to implement, but not to use (if implemented in a good way). I can see some solutions to some problems that can arise after setting "display LF literally" option, but also some can be more serious and will require more thinking, but overall, I don't see any UI/UX problems right now. And it's too early to discuss this anyway.

@joelim-work
Copy link
Collaborator

joelim-work commented Nov 13, 2023

To respond to some more points in this discussion:

You already forgot that we are dealing with the very rare but totally valid case (using \n). You want to fix this by using another special character (U+0001). And here is the problem: the user can use all of the special/rare characters in the absolute path, and then the only character left will be the U+0000. It's like "fighting fire with fire" (not the kind of that works). One kinda bad idea that just struck me is to use some kind of byte that will make the ~/.local/share/lf/files file an invalid UTF-8 text, meaning that... hold on, do names have to be a valid UTF-8 string? Maybe they don't have to. Scrap that.

It is already impractical enough that we are talking about dealing with filenames with newlines in them, but now we are extending this discussion to the filesystem of a hypothetical user in which every single possible byte happens to be used in filenames. As I have mentioned above, I agree with scrapping the idea of using filesep as part of the format for ~/.local/share/lf/files. However the same comments apply to filesep itself, if you are wanting to use $fx in a shell command - from the commit history it looks like filesep was added a long time ago in ad6ead5, and I think it is unlikely to change just to deal with such a far-fetched scenario.

I thought that lf has a server and many clients. Does it only have equal instances and no server? Like, is all the communication based only on the ~/.local/share/lf/files and nothing else?

You are correct that lf uses a server. Actually the server used to be responsible for storing the cut/copied files, but then it was changed to use ~/.local/share/lf/files instead, for persistence and also for access via shell scripts - see #177 for more details. In any case the communication between the client/server also used the same line-based record format, so files with newlines would still be an issue regardless.

If we are going to "hide" or make an opt-in option for handling files with \n then overall it is fine, but if it is not enabled by default, then it's not great.

I do understand that this will be a huge change, but I think it's clear that it has a lot of positives. It only has negatives in terms of implementation (and that users would have to edit their configs for a more simple API).

This is rather opinionated, and I am not convinced about changing the default behavior. You might personally be very invested in having lf be able to deal with filenames containing newlines, but I imagine that many users:

  • Don't know what Base64 encoding is, or don't want to have to deal with it (or any other scheme for encoding newlines)
  • Don't work with filenames containing newlines in the first place, and are therefore not affected by this problem
  • Don't want to spend time changing their configuration

As such, changing the default behavior may benefit you, but result in a net negative experience for other users.

Anyway, if anyone wants to cover new lines issue, you should also consider marks and tags

Thank you for bringing this up, I can also reproduce a crash when tagging a file with newlines. I think whatever is used to address ~/.local/share/lf/files should also apply to these files as well.

Since our UI taste (of how to display LF) is different, we can provide an option to change between different styles:

  • add a real new line;
  • print $'\n' or just provide a string to newlinefmt of how you want to see \n being displayed.

I am worried that adding a real newline could be confusing, intuitively you would think that each entry takes the same amount of space (i.e. one line). As for newlinefmt, I don't think it is a good idea to blindly add configuration options (they do pile up and become overwhelming for new users) just to address unlikely use cases - now we are talking about users who not only have to deal with newline characters but also want to customize how they appear in the UI. Besides, this has the same problem as what you mentioned for filesep, in that a hypothetical user could have filenames that legitimately contain all sorts of variations of \n/$'\n'/etc. and thus overlap with the visual representation of filenames that contain actual newlines.

Printing a hardcoded value of $'\n' seems reasonable enough to me. The following patch should work:

diff --git a/ui.go b/ui.go
index 9a18120..2de2f49 100644
--- a/ui.go
+++ b/ui.go
@@ -468,7 +468,7 @@ func (win *win) printDir(screen tcell.Screen, dir *dir, context *dirContext, dir
 			maxFilenameWidth -= len(info)
 		}
 
-		filename := []rune(f.Name())
+		filename := []rune(strings.ReplaceAll(f.Name(), "\n", "$'\\n'"))
 		if runeSliceWidth(filename) > maxFilenameWidth {
 			truncatePos := (maxFilenameWidth - 1) * gOpts.truncatepct / 100
 			lastPart := runeSliceWidthLastRange(filename, maxFilenameWidth-truncatePos-1)

Anyway regarding the decision on what action to take (if any), I will leave that decision up to @gokcehan, who I imagine might be tired from seeing all the comments here and in other issues like #47.

@Andrew15-5
Copy link
Author

Andrew15-5 commented Nov 13, 2023

I do understand that this will be a huge change, but I think it's clear that it has a lot of positives. It only has negatives in terms of implementation (and that users would have to edit their configs for a more simple API).

  • Don't know what Base64 encoding is, or don't want to have to deal with it (or any other scheme for encoding newlines)

You are talking about the encoding approach. I was referring to the in-memory approach, which is used instead/with ~/.local/share/lf/files and will be enabled by default or via an opt-in option. Imagine instead of doing the same:

  load=$(cat ~/.local/share/lf/files)
  mode=$(echo "$load" | head -n 1)
  list=$(echo "$load" | sed 1d)
  # or
  # mode=$(head -n 1 ~/.local/share/lf/files)
  # list=$(sed 1d ~/.local/share/lf/files)

everyone would use:

  mode=$(lf -mode)
  list=$(lf -files)

or something like:

  mode=$(lf -storage mode)
  list=$(lf -storage files)

or:

  mode=$(lf -files mode)
  list=$(lf -files list)

I think it's super pretty and very minimalistic and doesn't rely on anything but the built-in functionality (no POSIX commands, no file read). This again, will rely on Go's implementation of table/set/hashmap or on some kind of DB (SQLite, Redis). Then the ~/.local/share/lf/files file won't be needed (for users of the lf) and therefore can be removed or replaced with SQLite's file-DB or Redis's "backup/restore" file (from which the Redis DB will be restored to be used in lf). And since lf does use a server process, then the same process can be used to handle DB (same as lf -remote $id handles signals).

I want to hear an opinion about such approach and which difficulties it can introduce during implementation and execution (from someone who knows Go and lf codebase more closely).

@joelim-work
Copy link
Collaborator

joelim-work commented Nov 13, 2023

I think what you're trying to describe already exists. The server query command (available from r31 onwards) can be used to output certain state information about an lf instance. For example, to display the current mappings:

lf -remote "query $id maps"

Support for the list of cut/copied files was not added because it was deemed redundant, since ~/.local/share/lf/files already exists.

In any case, such a command just outputs a stream of bytes. How are you planning to delimit entries while allowing for newlines in filenames? Using \0 could be an idea, but would be inconsistent with the current query commands, and also wouldn't work for filenames containing \0 in them.

I want to hear an opinion about such approach and which difficulties it can introduce during implementation and execution (from someone who knows Go and lf codebase more closely).

I assume you are referring to the people mentioned in #1329. All of them have responded at some point in this issue.

@Andrew15-5
Copy link
Author

I think what you're trying to describe already exists. The server query command (available from r31 onwards) can be used to output certain state information about an lf instance.

Oh yeah, cool. I skimmed through the changes too fast (a while ago) and forgot about this.

Using \0 could be an idea

That's the idea (like find -print0) then you can use xargs -0 or anything else (Python script).

but would be inconsistent with the current query commands

This is the question of adding a new flag (or changing behavior of existing query commands) or just using a dedicated (not like for other query commands) \0 file separator. If a map has \0 or \n in the shellscript, how would it be printed lf -remove "query $id maps"?

map a $a=$(printf '\n\0') # example map

wouldn't work for filenames containing \0 in them.

Are we talking about Windows (maybe some other supported OSes)? In UNIX, you can't use \0 in file names (Linux, BSD, macOS). Does NTFS support non-printable/visible characters? Because \n is visible (kinda) and \0 is not.

I want to hear an opinion about such approach and which difficulties it can introduce during implementation and execution (from someone who knows Go and lf codebase more closely).

I assume you are referring to the people mentioned in #1329. All of them have responded at some point in this issue.

I refer to people in this (#1114) issue. Like not specifically you, but since you are very active, then you could be the first one in the group.

@joelim-work
Copy link
Collaborator

joelim-work commented Nov 13, 2023

Sorry I might be wrong on the possibility of \0 in filenames, it's not possible on Windows either. There may be some obscure system that allows it but I think it's not worth considering. Actually the idea of using \0 as the delimiter was already mentioned very early on in this issue in #1114 (comment), but I assume the discussion sort of went all over the place afterwards and it got lost.

Similar to ~/.local/share/lf/files, the query commands just use newlines as the delimiter to keep things simple. For mappings, only the first line of shell scripts are displayed (this is the same as the preview you would see if you press part of a multikey mapping, see #363). I think it is possible to include \0, which would be output literally, but I don't see why you would want to have \0 in a shell script.

From the direction of this conversation, it looks like you're requesting for some kind of zero option (name suggestions are welcome) which will use \0 as the delimiter instead of \n. In that case it can just be applied to ~/.local/share/lf/files, and all this talk about using the lf server / in-memory approach isn't relevant - if you want to access the list of files in a shell script, there isn't much difference between $(cat ~/.local/share/lf/files) and $(lf -remote "query ..."). TBH I think it might be nice to access the list of cut/copied files via query, but it's a nice to have when ~/.local/share/lf/files already exists, and probably deserves a separate discussion. I suppose the same goes for $fs/$fx in which case filesep would not be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants