Skip to content

Conversation

tombentley
Copy link
Member

No description provided.

Signed-off-by: Tom Bentley <[email protected]>
Copy link
Collaborator

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great overall! A few comments and suggestions inline.


A route object has a `name`, an optional list of `filters` (being the names of the filters to be applied to requests/responses that traverse this route) and either a
`cluster` or a `router` property, which names the receiver which will handle requests after any filters have been applied.
Exactly one of `cluster` or `router` must be specified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be receiver instead? This conveys the role of that entity in this context more clearly, IMO. On the downside, it would require names to be unique across cluster and router objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two points there:

  1. Is the property better named for the role or the type which fulfills the role? I agree that role is better in general. But is receiver, specifically, a good name? I'm not sure an abstract noun makes the best choice, especially when the things which fulfill that role won't be associated with that word in the config file (how is a newbie to know that a router is a receiver? We're forcing them to read the docs and grok a bunch of concepts). Perhaps connectsTo would be a bit more obvious?

  2. Should the names of things that fulfill the role of a receiver be in a single namespace, or a per-type one?

    • We already have filterDefinitions and virtualClusters as collections of named things, each in their own namespace. So it would be least surprising to stick with a per-type pattern.
    • OTOH, filters are not "receivers" as I've defined them here, so we could probably wrangle a compromise.

    However, I think that compromise is a bit grubby. The problem is that a list of items of mixed type (either a router or a cluster) either makes the type of the item or explicit, or not.

    • Hiding the type of an item is not desirable because it makes it much harder to build a model picture of the overall graph. E.g. when I'm looking an an item that could be a router or a cluster then it's important to know that something is a router, because I know a router should have routes, and those routes define the next node in the DAG. Likewise I know cluster should not have routes, and that what I'm looking is is conceptually a leaf node of the DAG.

    • But making the type of an item explicit isn't great either. Here's a fully worked example (note the indirection properties like virtualCluster, cluster and router, which make the type of each node explicit):

      nodeDefinitions:
        - name: my-simple-vc
          virtualCluster:
            portIdentifiesNode: ...
            filters: 
              ...
          connectsTo: my-backing-cluster
        - name: my-clever-vc
          virtualCluster:
            portIdentifiesNode: ...
            filters: 
              ...
            connectsTo: my-router
        - name: my-backing-cluster
          backingCluster:
            bootstrapServer: ...
            tls: ...
        - name: my-router
          router:
            type: MyRouter
            config: 
              ...
            routes:
            - name: to-backing-cluster
              filters: # a list of filter names
                ...
              connectsTo: my-backing-cluster

    Personally I find that quite hard to read because:

    1. It could be in any order
    2. The referring properties (connectsTo) can appear either a either a child of a virtualCluster, or a grandchild of a router (via a route).

    A different kind of compromise might work better:

    clusterDefinitions:
      - name: my-cluster
         ... # as defined in this proposal currently
    virtualClusterDefinitions:
      - name: my-vc
        filters: ...
        connectsTo:
          cluster: my-cluster
      - name: my-vc
        filters: ...
        connectsTo:
          router: my-router
    routerDefinions:
      - name: my-router
        type: ...
        config: ...
        routes:
          - name: to-backing-cluster
             filters: # a list of filter names
             connectsTo: 
               cluster: my-backing-cluster

    I find this easier to read:

    1. There's a single place to look for all the backing clusters that can be connected to (clusterDefinitions)
    2. There's a single place to look for all virtual cluster presented to clients (virtualClusterDefinitions)
    3. All the intermediate nodes a jumbled together, but they're necessarily all Routers, so the list is easier to read because of the repeating pattern to the items.
    4. The connectsTo role is used for both routers and virtual clusters.
    5. The nature of the thing being connected-to is explicit (it's either a cluster or a router), so I know which top level property to look in to find it.

Wow, that's gone quite deep! But wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the property better named for the role or the type which fulfills the role? I agree that role is better in general. But is receiver, specifically, a good name?

Yes, also not a huge fan of receiver, only went with it because it was the originally used term. connectsTo is ok, but it's also kinda irregular, as all the other terms seem to be nouns, and now suddenly we switch to an imperative verb. Perhaps destination, output, sink could work?

As for the type vs. role-based naming, I only meant this in the context of this particular polymorphic reference, not for the entity definitions themselves (totally agreed that the type-based naming is the right thing to do there). To me, it's comparable to naming variables in code: I generally dislike things like customersList, i.e. encoding the type, rather the focus should be on the semantics of what it is: customers.

If you feel that something like destination: my-backing-cluster is not ideal in terms of discoverability when authoring the config (which is a good point, I'd hope schema-based tooling would offer the right completions, though?), then I like that last proposal of connectsTo: cluster: my-backing-cluster, which encodes the role and then makes the available types very explicit 👍. Just perhaps with a different name, as suggested above.

This will prevent the possibility of a request getting stuck in a router loop.

In order for non-trivial router graphs to be useful, `Router` authors will need to follow the same principle of composition as `Filter` authors.
That is, a `Router` implementation should only talk to its receivers, and not, for example, make direct connections of its own to a backing cluster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that routers should not make "out of bands" requests, but this doesn't follow from the previous "in order to be useful" to me. I mean, out of bands requests can be useful, but we're discouraging them for the sake of validation, error handling, etc., I suppose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that all communication with the cluster should be exclusively through the router/filter graph. Routers and filters should not make their own TCP connections directly to brokers.
Routers and filters may originate their own requests through the FilterContext/RouterContext API.

Will reword.

filters: # a list of filter names
...
cluster: my-backing-cluster
virtualClusters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, is the notion of virtual clusters even needed at this point? IIUC, it's equivalent to having a router definition with a no-op router and one single defined route with the filter chain. There probably is value in having this more concise way of modeling that common case, but it also goes a bit against the principle of providing exactly one way to accomplish a given task. Some food for thought, not sure what's best.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some value in modelling the source and sink nodes of the graph as first-class entities, rather than as special cases of Router:

  • Such special casing is not easily discoverable and so can be surprising to users.
  • the proxy runtime itself needs to know about them because it has to bind server sockets or open client connections etc. All the other types of Routers are treated as black boxes, without the runtime interpreting their configuration.
  • for many users the "point of entry" is a different kind of thing, and for similar reasons. Having one place to look for all the networking-relevant configuration is better than it being obscured amonst a lot of non-networking stuff.

Obviously either way of modelling it works, and I don't think either is more or less powerful than the other. So I think this choice boils down to criteria/opinions about what makes a tasteful API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. +1 on keeping virtual clusters a separate first-class concept.

tombentley and others added 2 commits June 16, 2025 22:56
Co-authored-by: Gunnar Morling <[email protected]>
Signed-off-by: Tom Bentley <[email protected]>
Signed-off-by: Tom Bentley <[email protected]>
@tombentley tombentley mentioned this pull request Jul 4, 2025
Only one backing topic is writable at any given logical time.

* **Principal-aware routing**.
A natural variation on basic SASL termination is to use the identity of the authenticated client to drive the decision about which backing cluster to route requests to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routing could happen on other different parameters too like topic name, attribute value in the payload or header. But, has some constraints in each like handling duplicates.

Copy link

@prabhamanepalli prabhamanepalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this proposal and ability to present multiple backing Kafka clusters as one single cluster to clients.

Can you please look into different types of routing other than identity/principal based routing.

Routing based on topic name, some embedded attribute in the header or payload itself.

I would like to know if the API being proposed is extensible for other types of routing other than principal based routing.

@tombentley
Copy link
Member Author

@prabhamanepalli thanks for taking a look!

Can you please look into different types of routing other than identity/principal based routing.
Routing based on topic name, some embedded attribute in the header or payload itself.

Those should absolutely be possible, and I consider them in-scope for this proposal. The "Union cluster" idea alluded to in the text is really a special case of this, where the mapping between topics (or more generally named broker-side entities like consumer groups) is defined statically in the configuration file. More generally, that mapping could be supplied over the network (e.g. from a database, metadata store or whatever).

It's up to the router how each request gets routed. Most common would be routing a request to a single target cluster or next router. But I don't think the API should enforce that. If a router implementation wanted to route a produce request, for example, to multiple clusters then that should be possible at the API level, even if dual writes are not usually a good idea.

What's not in scope for this proposal currently is making the set of routers, or the set of target clusters, dynamic during the lifetime of the proxy process. Dynamic configuration update is a separate item on the backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants