Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ET hangs indefinitely trying to connect to some machines #655

Open
CaptainFlint opened this issue Sep 6, 2024 · 6 comments
Open

ET hangs indefinitely trying to connect to some machines #655

CaptainFlint opened this issue Sep 6, 2024 · 6 comments

Comments

@CaptainFlint
Copy link

I have several machines with RHEL9.4-derived OS. Each of them has ET-6.2.8 installed from EPEL. I can connect using ET client to some of them, but not to others. The connection just hangs indefinitely after requesting the remote account's password. When that happens I can see in the list of processes that the client launches the command:
ssh root@${DESTINATION_ADDR} echo '${SOME_RANDOM_SYMBOLS}_xterm-256color' | etterminal --verbose=0
and this child process sits there, and the parent et process seems to be waiting for it to finish, but it never happens. If I press Ctrl+C, the connection is aborted with:

Got interrupt (perhaps ctrl+c?): 2. Exiting.

But if, instead, I kill this child ssh process, ET client successfully establishes the connection, and I am logged in onto the remote server.

I attached the verbose=9 logs from the client and server for both the successful connection, and the hanging one (in the first one I connected, then logged out; in the second one I waited for a while, then terminated it by Ctrl+C).

I found a similar issue here: #464 Even more, originally I discovered the issue on a machine with the ET client version 6.1.8, and it behaved exactly like described in that issue: Ctrl+C did not terminate the connection, but instead proceeded to successful establishing of it. But after I upgraded to 6.2.8 to check if that could be a version difference, Ctrl+C behavior changed. I suspect (but not sure), that in the older version, Ctrl+C simply killed that hanging ssh process, but kept the et client alive; and now it also kills et itself (this is how I thought of killing the ssh process manually to test this idea).

I also verified that the explanation suggested in that issue's comments is not applicable to my situation. I tested with both server and client version 6.2.8, and libsodium is also of the same version (1.0.18). Even more, I tried to connect directly from one of those servers to another, and the same thing happens. Connection from the "good" server to the "bad" one hangs; and from the "bad" one to the "good" one works fine. The systems on both machines are installed from the same base OS, and use the same repositories, but they have some differences in the list of installed packages, and settings. I tried to compare the list of packages and the contents of /etc's on both machines, but I couldn't notice anything that might have looked relevant.

etclientLogs-fail.tar.gz
etclientLogs-success.tar.gz
etserverLogs-fail.tar.gz
etserverLogs-success.tar.gz

@MisterTea
Copy link
Owner

We start an interactive session to ensure that the proper environment variables are set when running etterminal on the remote machine. Is it possible that some of your machines have a .bashrc or .bash_profile that is truly interactive and waiting for user input?

For example, oh-my-zsh by default will stop and ask you if you want to update when it detects an interactive login. The handshake may be hung up on that, which explains why killing the handshake fixes the problem

@CaptainFlint
Copy link
Author

Ah, sorry, forgot to write about that part. I saw it in that issue, and checked for it too, to the best of my abilities. I don't think there is any interaction. At least, when I connect to the same server via ssh command, it definitely does not require any interaction from me. I just start ssh root@host and immediately get the usual bash prompt, ready to accept commands. Also, it works fine when I run it non-interactively, but with a command line to run and exit.

Just in case I removed everything from /root/.bashrc, except the usual . /etc/bashrc sourcing (the /etc/bashrc itself was never modified by me), and also removed my custom script from /etc/profile.d/ on the problematic server, but it didn't change anything. Well, I didn't really expect it to, since all those scripts did was adding some variables and aliases.

However, that gave me an idea: I just started the actual ssh command generated by the ET client, directly:
ssh root@host "echo 'XXX...XXX_xterm-256color' | etterminal --verbose=0"
and, indeed, it prints the IDPASSKEY string with a couple of newlines, and then hangs indefinitely. This does not happen if I run any other non-interactive command this way. For example, if I run ssh root@host "echo 'XXX...XXX_xterm-256color'", it just prints the text and returns back to my local bash prompt. And when I try that ssh generated by the client, I can see etterminal process remaining present on the server that whole time, even after I stop the ssh command on the client machine with Ctrl+C. I had quite a few of them running after all the experiments, actually; had to kill them all.

@MisterTea
Copy link
Owner

Does your /etc/et.cfg have daemon set to 0? That would do it, but I don't know why that would be set that way.

Can you run etterminal locally from inside an existing ssh connection and get it to hang? A stack trace would be really useful.

@CaptainFlint
Copy link
Author

Does your /etc/et.cfg have daemon set to 0?

No, here is my et.cfg:

; et.cfg : Config file for Eternal Terminal
;

[Networking]
port = 2022
# bind_ip = 0.0.0.0

[Debug]
verbose = 0
silent = 0
logsize = 20971520
telemetry = true
logdirectory = /tmp

The only thing I changed was verbose=9 to try and find the source of the issue in the logs, and then back to 0.

Can you run etterminal locally from inside an existing ssh connection and get it to hang? A stack trace would be really useful.

OK, that opened another line of investigation. So, when I start that command line directly on the server, it prints the IDPASSKEY, forks, and the parent process exits with code 0 (I checked the details using strace). Input control returns to bash prompt, and the forked copy remains running in background. Therefore, in this scenario all looks normal to me.

Then I started it via ssh, but substituted the etterminal call with strace wrapper. From the generated logs I could see that the parent etterminal also exits normally in this case. And that the forked process calls setsid and reopens the standard input/output streams into /dev/null. So, again, from the etterminal side all looks normal. But still, for some weird reason, the ssh connection (without strace) remains active and holds the input, until I manually kill that background etterminal fork on the server, or Ctrl+C the ssh itself (in which case the forked etterminal remains running on the server).

That gave me an idea to see if allocating a terminal in ssh would change things — and it did. When I started ssh -t root@host "echo 'XXX...XXX_xterm-256color' | etterminal --verbose=0" (I added -t option), it printed the IDPASSKEY, then typed "Connection to ${hostname} closed.", and returned control to bash prompt. For the sake of experiment I added "RequestTTY force" to my ~/.ssh/config for that host, and et root@host connected immediately.

Therefore, it all seems to be somehow connected to the presense/absense of TTY, although I still have no idea why it only happens only on some machines but not the others.

@MisterTea
Copy link
Owner

Is there any harm in adding the -t option when doing the ssh handshake?

@CaptainFlint
Copy link
Author

That's a very good question, which, unfortunately, I don't have an answer to. I already tried adding "-t" to the ssh call in the ET source code and rebuilt it, and it seems to be working with both the machines that had issues, and those that didn't. But I have not performed any serious testing.

Well, actually, there's one more problem I noticed, but it's been happening before I even added that option, so it's most probably unrelated. Every time when I log out from the ET session, etterminal on the server seems to crash. At least, I get the report about it from abrt the next time I visit the server. And it happens with both the "good" and the "bad" servers. I will try to gather info and report it separately, later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants