ttaaefs_peerip causes silent node failure when set with an FQDN #12

TM553432 · 2025-02-18T14:05:33Z

Discovered when trying to set up both realtime and fullsync NextGenRepl between 2 fully functional clusters of identical node count where all nodes were running KV 3.2.4. As part of setting up fullsync, the ttaaefs_peerip setting in the source cluster has to point to the fullsync peer in the sink cluster. As an FQDN had been set up for each node, we put the FQDN in instead of the IP address in the ttaaefs_peerip setting on the source cluster. This caused Riak to fall over silently with no log of cause. This was remedied by changing it to a hard-coded IPv4 address.
When using an environment where IP addresses can change, this seems to contradict Riak's ability to use an FQDN in the nodename.
To replicate:

Create two clusters
Set up NextGenRepl as one would normally
On a source node, set ttaaefs_peerip to an FQDN that points to a sink node
Restart Riak on the source node you updated
Check whether Riak started.
On the same source node, change ttaaefs_peerip to the IPv4 address of the same sink node.
Restart Riak on the source node you updated.
Check whether Riak started.

The text was updated successfully, but these errors were encountered:

martinsumner · 2025-02-18T14:34:15Z

The configuration schema specifically requires it to be an IP address, and uses a validation function to confirm:

https://github.com/OpenRiak/riak_kv/blob/openriak-3.2/priv/riak_kv.schema#L1110-L1129

So there should be some sort of Cuttlefish error at startup, but this operator response may have been lost in the upgrade of relx.

It might be that a FQDN would work, and this is just a schema issue. Looking at the code, the parsed string is simply passed to riak erlang client start_link function which can take an FQDN. You could test setting the FQDN via advanced.config, which will bypass the IP address validator in the riak.conf schema.

{riak_kv, [
    {ttaaefs_peerip, "fqdn.example.net"}
  ]
}

martinsumner · 2025-02-18T14:38:46Z

You may find riak chkconfig useful to confirm riak.conf is correct before trying to run riak - https://docs.riak.com/riak/kv/latest/using/admin/riak-cli/index.html#chkconfig.

TM553432 changed the title ~~Riak and Listener are correct but the node is not reachable~~ ttaaefs_peerip causes silent node failure when set with an FQDN Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ttaaefs_peerip causes silent node failure when set with an FQDN #12

ttaaefs_peerip causes silent node failure when set with an FQDN #12

TM553432 commented Feb 18, 2025

martinsumner commented Feb 18, 2025

martinsumner commented Feb 18, 2025

ttaaefs_peerip causes silent node failure when set with an FQDN #12

ttaaefs_peerip causes silent node failure when set with an FQDN #12

Comments

TM553432 commented Feb 18, 2025

martinsumner commented Feb 18, 2025

martinsumner commented Feb 18, 2025