Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client side routing implementation #205

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

madchicken
Copy link
Contributor

@madchicken madchicken commented Dec 16, 2024

This PR is a draft implementation of the client-side routing for new4j clusters.
The PR implements the ROUTE message for both the stable bolt implementation and the unstable one. Tests are being performed on the stable version of the protocol since the new one is still giving me problems in deserialization.
What the PR is still missing:

  • Testing: I am currently testing the driver on a real cluster but I have no guarantee of the performance and stability of the routing algorithm
    - [ ] Bookmarks are not supported at the moment (the array is always sent empty)
  • The maximum supported version of the protocol is now 4.4 but we miss some of the features the protocol should expose from this version (the imp_user should be added to the transaction begin message)
    - [ ] Only the round-robin algorithm is currently implemented as a routing strategy. Maybe we want to implement a LeastUsed algo and make it configurable.
    - [ ] There is no handling of the initial routing context: it is always empty.
  • The retry mechanism should probably be reviewed with routing: it does not make sense to retry an operation on a failing connection, we should select another server from the cluster instead.

Additional Notes:

  • The lock mechanism for refreshing the routing table is sub-optimal (to use an euphemism). We should try to be smarter and use a different method to check for a new routing table.
    - Extra fields in the route message are not implemented yet (part of v4.4) Protocol version is now 4.4

@knutwalker
Copy link
Collaborator

Hey, thanks for this PR, this helps a lot!

I went over the changes and don't have anything to add, but I also did look in detail at the actual routing protocol and algorithms.

I'm ok with merging something before dealing with all limitations. In particular, I think it's fine to not support the routing context, bookmarks, only have partial 4.3 support, only having one routing strategy at the beginning. We don't need to have everything figured out upfront and can do this in steps/multiple PRs.
I would also be ok with only adding it to the new protocol implementation.

I've been thinking about the testing a bit (or a lot) lately; I don't like the amount of integration tests compared to what the unit tests do. Especially when testing bolt protocol details of a non-happy path, it can be either impossible or very difficult to setup.
The current setup is fine to actually run something against a one-and-done database, but it doesn't scale as well as I would like it to be.

I have half an idea to rewrite the core parts with the connection IO and pooling being abstracted out. I would like to be able to do more simulation testing where exact IO operations can be simulated in the tests without having to have an actual db to setup. I'd also like to move async out from the core to some outer layer. Maybe this goes in the sans-io direction, maybe it'll look a bit different.

@madchicken
Copy link
Contributor Author

Thanks for the reply @knutwalker, I will remove the implementation of the client-side routing for the stable protocol and leave it only when the unstable-bolt-protocol-impl-v2 feature is active. Now that I have the workaround for the dates deserialization bug I can do some more testing. One thing I need to understand is this: when selecting the server in the routing table, should I use the bolt protocol or stick with neo4j for the final connection? I think this is very important for the server, isn't it?

@knutwalker
Copy link
Collaborator

I will remove the implementation of the client-side routing for the stable protocol and leave it only when the unstable-bolt-protocol-impl-v2 feature is active.

great!

One thing I need to understand is this: when selecting the server in the routing table, should I use the bolt protocol or stick with neo4j for the final connection? I think this is very important for the server, isn't it?

There shouldn't be a need for any scheme. The first connection uses the scheme to determine the initial routing and the encryption to use. When connecting to a server from the routing table, you use the same encryption settings (and, when supported, routing context) and directly connect to that socket address, without going through the bolt/neo4j scheme parsing again. That is what the product drivers do, at least. That looks like this requires some changes to the pool creation, essentially using the same ConnectionInfo, but with only the host/port replaced.

@madchicken
Copy link
Contributor Author

There shouldn't be a need for any scheme. The first connection uses the scheme to determine the initial routing and the encryption to use. When connecting to a server from the routing table, you use the same encryption settings (and, when supported, routing context) and directly connect to that socket address, without going through the bolt/neo4j scheme parsing again. That is what the product drivers do, at least. That looks like this requires some changes to the pool creation, essentially using the same ConnectionInfo, but with only the host/port replaced.

Yes, sorry...I realized that too.

@madchicken
Copy link
Contributor Author

@knutwalker please take a look at my last commit where I had to modify the main Graph interface. I had essentially to expose the Operation::Read/Write parameter only for the new protocol implementation. I hope it makes sense.
I also added a note in the description of the PR about the retry mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants