Skip to content

Conversation

@zhijun42
Copy link
Contributor

By default, servers in a cluster don't have a human-readable nodename unless explicitly set by user. This can be annoying when reading the server logs during local development, since we would need to manually lookup the unique node ID (40 chars long) to figure out its port number. Most logging statements in the cluster are in the form of serverLog(LL_NOTICE, "Something.. %.40s (%s) something..", node->name, node->human_nodename); , and thus this would always result in an empty () like the following:

2832:M 26 Oct 2025 18:00:14.033 * Node dd4f43d9e4a1a2875a514733c08d4870c21db682 () is now a replica of node 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 () in shard d3fc557d12ab3c5a8850c7b0c9afb252b473cad1
2832:M 26 Oct 2025 18:00:14.033 * Clear FAIL state for node dd4f43d9e4a1a2875a514733c08d4870c21db682 (): replica is reachable again.
2832:M 26 Oct 2025 19:21:07.191 * NODE 524978a9924b5841c4c93f83581d202cfbce1443 () possibly failing.
2832:M 26 Oct 2025 19:21:07.838 * FAIL message received from 747e0f97a58cd5a7ac4c2130417fc47a807802ed () about 524978a9924b5841c4c93f83581d202cfbce1443 ()
2832:M 26 Oct 2025 19:22:40.434 * Clear FAIL state for node 524978a9924b5841c4c93f83581d202cfbce1443 (): replica is reachable again.
2832:M 26 Oct 2025 19:46:43.539 * FAIL message received from 747e0f97a58cd5a7ac4c2130417fc47a807802ed () about 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 ()
2832:M 26 Oct 2025 19:46:43.539 # Cluster state changed: fail

Assigning a default human-readable nodename solves the annoyance:

7614:S 26 Oct 2025 19:52:21.642 * NODE 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) possibly failing.
7614:S 26 Oct 2025 19:52:22.550 * Marking node 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) as failing (quorum reached).
7614:S 26 Oct 2025 19:54:15.915 * NODE b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) possibly failing.
7614:S 26 Oct 2025 19:54:16.117 * Marking node b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) as failing (quorum reached).
7614:S 26 Oct 2025 19:54:16.117 # Cluster state changed: fail
7614:S 26 Oct 2025 19:54:16.117 # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration.
7614:S 26 Oct 2025 19:54:16.574 * Reconfiguring node 524978a9924b5841c4c93f83581d202cfbce1443 (127.0.0.1:7005) as primary for shard 8488c1c467066206db60b136e9a38e1f583eb1f8

For now I use the node's IP and port combination as the nodename. If users have personal preference on the format, they can always update nodename via CONFIG SET cluster-announce-human-nodename MYNAME command or edit the config file.

Signed-off-by: Zhijun <[email protected]>
@zhijun42 zhijun42 marked this pull request as draft October 26, 2025 14:14
@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.48%. Comparing base (baf2d57) to head (503e560).
⚠️ Report is 4 commits behind head on unstable.

Files with missing lines Patch % Lines
src/cluster_legacy.c 76.92% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           unstable    #2777     +/-   ##
===========================================
  Coverage     72.47%   72.48%             
===========================================
  Files           128      128             
  Lines         71624    70156   -1468     
===========================================
- Hits          51908    50851   -1057     
+ Misses        19716    19305    -411     
Files with missing lines Coverage Δ
src/cluster_legacy.c 87.45% <76.92%> (+0.36%) ⬆️

... and 103 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhijun42 zhijun42 marked this pull request as ready for review October 27, 2025 05:40
Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally use the first/last few characters of node-id while searching and I've never seen it conflicting with other node's id.

@zhijun42 Wouldn't you be able to set this externally on the engine by yourself?

@hpatro
Copy link
Collaborator

hpatro commented Oct 28, 2025

On this note, Docker's approach of generating random name for resource has always been interesting, easier to spot/remember.

https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go

@zhijun42
Copy link
Contributor Author

@hpatro Thanks for reviewing!

I generally use the first/last few characters of node-id while searching and I've never seen it conflicting with other node's id.

Yeah the first/last few characters of node ID are unique enough. I assume your use case is perhaps like this: You have a node that did something interesting and/or you're interested in it, then you search through the logs using that node ID to see how this node interacted with others.

My use case is a bit different though. When developing locally (non Docker/K8s), I have a mini cluster running where each node is running in a terminal tab. I'd like to fully observe all behaviors of the cluster, so inside each terminal tab I'm reading through this node’s log stream and would like to instantly recognize peers it’s communicating with. A 40-hex node ID is cognitively heavy; 127.0.0.1:7002 is instant.

Note that I'm not searching for any particular node, I'm simply reading the entire log stream. And having logs like these would be very useful for me:

* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 747e0f97a58cd5a7ac4c2130417fc47a807802ed (127.0.0.1:7004)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 524978a9924b5841c4c93f83581d202cfbce1443 (127.0.0.1:7005)
* FAIL message received from 26492268f63f184e0221402c4f6cb08ed7e941ea (127.0.0.1:7008) about 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001)

Wouldn't you be able to set this externally on the engine by yourself?

Yeah we can always manually assign names to nodes, but I don't wanna bother doing this every time I start up a cluster. The default combination of IP + port proposed here is pretty recognizable for me.

On this note, Docker's approach of generating random name for resource has always been interesting, easier to spot/remember.

TIL! Browsed through the list and just found a lot of great guys in our history. We can probably do the same thing here in Valkey. My approach of using IP + port is functionally the same as using scientists' names - just helping developers associate a server node with some string they can immediately recognize. Since I can easily recognize IP + port, that is what I propose here.

The thing is Valkey cluster has most logs print out the node ID and nodename, but the nodename is always empty unless explicitly set, which is not great user experience for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants