Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RQD assumes it's hostname is reachable by Cuebot #1510

Open
KernAttila opened this issue Sep 12, 2024 · 4 comments
Open

RQD assumes it's hostname is reachable by Cuebot #1510

KernAttila opened this issue Sep 12, 2024 · 4 comments
Labels
feature request New feature

Comments

@KernAttila
Copy link
Contributor

KernAttila commented Sep 12, 2024

Describe the bug

Hi all !
When launched, RQD sends a host report to Cuebot stating its hostname.
But Cuebot cannot reach the RQD machine back (and doesn't know it).
In most scenarios it works, but on my setup, machines are able to communicate only via a host nickname, they cannot see the other machines via their local hostname.
(I'm using NordVPN meshnet feature to emulate a local network, I guess it should behave the same on other VPN solutions.)

To Reproduce
(with all machines on nordvpn meshnet, or any vpn I guess)

  1. Clear host list on cuebot
  2. Launch RQD -> host appears in cuecommander
  3. Lock the machine via cuecommander -> error, cannot communicate
  4. Host cannot receive jobs.

Under the hood

  1. RQD sends a host report with its hostname.
  2. Cuebot saves/updates the host and its stats in the database.
  3. RQD continues to send reports saying its alive.
  4. Cuebot thinks the machine is available but does not test.

Expected behavior
Send a hostname that Cuebot can reach.
Suggestion: do a handshake

  1. On launch, RQD sends all known hostnames to Cuebot.
  2. Cuebot pings back each one and uses the first that responds.
  3. Cuebot sends a "gotcha, here's your reachable hostname" to RQD
  4. RQD saves that value internally and uses it for its next reports.
  5. Cuebot can now reach the RQD host.

Version Number
Dev

@KernAttila KernAttila added the bug Something isn't working label Sep 12, 2024
@lithorus
Copy link
Contributor

lithorus commented Sep 17, 2024

Have you tried using the RQD_USE_IP_AS_HOSTNAME in rqd.conf?

@KernAttila
Copy link
Contributor Author

Thanks @lithorus, this worked. Cuebot can now identify the proper host running RQD using its IP address.
Though it would be nice to be able to keep the identifier tied to the hostname in such scenario.
IP is ok, but often the hostnames convey meaning in an infrastructure.
Do you think it would be worth the little compute overhead to have this feature ?

Maybe it could be tested only if there is no deliberate overrides in rqd.conf, like RQD_USE_IP_AS_HOSTNAME=False and OVERRIDE_HOSTNAME is not set.

The full story is that I stumbled on this issue while working on a tray icon for OpenCue and I need to make sure I can reach the machine properly through the server, so I had to implement such logic and thought it could be a useful addition to the RQD codebase.

@DiegoTavares DiegoTavares added feature request New feature and removed bug Something isn't working labels Sep 19, 2024
@DiegoTavares
Copy link
Collaborator

I think using the RQD_USE_IP_AS_HOSTNAME feature is a work around this issue, but I'm not against a new feature with the proposed handshake mechanism. This being sad, I'm changing the status of this issue to feature request.

When implementing this, please keep in mind the possibility of a future design where rqd will not directly interact with cuebot using grpc, but use a queueing mechanism (eg. Kafka), and Cuebot will continue to interact directly with Rqd using grpc.

@lithorus
Copy link
Contributor

Are there any plans on doing single direction connection instead of bi-directional communication between cuebot and rqd? That would also solve it (and also make a network admin happy)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature
Projects
None yet
Development

No branches or pull requests

3 participants