Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 4.x can't connect to the cluster when first node is non responsive #357

Open
wants to merge 2 commits into
base: scylla-4.x
Choose a base branch
from

Conversation

dkropachev
Copy link
Collaborator

@dkropachev dkropachev commented Oct 7, 2024

Closes #356

Since netty bootstrap.connect uses only first address of unresolved InetSocketAddress 4.x does not even try to connect to other when it fails.
This PR makes driver resolve unresolved endpoint itself and only then handing it over bootstrap.connect

@dkropachev dkropachev force-pushed the dk/356-4x-cant-connect-to-the-cluster-when-first-node-is-non-responsive branch 5 times, most recently from 7905ae0 to 23ce185 Compare October 7, 2024 20:31
@dkropachev dkropachev added the bug Something isn't working label Oct 7, 2024
@dkropachev dkropachev force-pushed the dk/356-4x-cant-connect-to-the-cluster-when-first-node-is-non-responsive branch 2 times, most recently from 79f95ee to e0cfb07 Compare October 7, 2024 23:10
@dkropachev dkropachev marked this pull request as ready for review October 8, 2024 08:37
@dkropachev dkropachev force-pushed the dk/356-4x-cant-connect-to-the-cluster-when-first-node-is-non-responsive branch 3 times, most recently from e7155f9 to b6edf18 Compare October 8, 2024 10:34
Copy link
Collaborator

@Bouncheck Bouncheck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test seems to be working and resolving the issue. Current version replaces the unresolved InetSocketAddress with DefaultEndPoints with resolved addresses - this is not in line with current description of "advanced.resolve-contact-points" option.
I am not sure if modifying EndPoint interface which is a part of public API is a good idea or necessary.

*
* <p>This will be called each time the driver opens a new connection to the node. The returned
* address cannot be null.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not intended documentation for this method. It's the same as for resolve()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed.

return resolvedContactPoints;
}

@Synchronized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it do? Is it only an indicator it should be synchronized or does it provide the synchronization too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does the same, but on the field Lock, instead of this, but what a trip, in my head all the code uses @Synchronized, while it is not. switched to method modifier instead.

@@ -237,4 +235,40 @@ public void run_replace_test_20_times() {
replace_cluster_test();
}
}

@Test
public void cannot_connect_if_first_node_is_unavailable() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it locally and it passes. The name suggests the driver would be unable to connect though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something along the lines of should_connect_despite_incorrect_first_A_record() would be fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@dkropachev dkropachev force-pushed the dk/356-4x-cant-connect-to-the-cluster-when-first-node-is-non-responsive branch 4 times, most recently from d3b7fb7 to 9ac58ad Compare October 15, 2024 11:50
netty bootstrap.connect uses only first address of unresolved InetSocketAddress.
That causes 4.x to not even try to connect to other when it first one fails.

This PR makes driver to resolve unresolved endpoint, instead of leaving
to to netty.
Making it possible to connect to any ip address from DNS contact endpoint.
@dkropachev dkropachev force-pushed the dk/356-4x-cant-connect-to-the-cluster-when-first-node-is-non-responsive branch from 9ac58ad to 0dfd84c Compare October 15, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants