Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peers and Streams #30

Open
mfelsche opened this issue Nov 2, 2020 · 0 comments
Open

Peers and Streams #30

mfelsche opened this issue Nov 2, 2020 · 0 comments
Labels
enhancement New feature or request open-review This RFC is undergoing open / public review with intent to proceed

Comments

@mfelsche
Copy link
Member

mfelsche commented Nov 2, 2020

Peer

A peer is an entity/artefact that maintains connections to the outside world, to other computer systems. It also defines, how data from the outside is deserialized into an event and how events are serialized back to be sent out.

An example would be a websocket server, who maintains a listening socket for clients to connect to and maintains client connections.

We already have this kind of thing in every source and sink that is backed by a connection-oriented protocol like TCP. The source maintains client connections and handles deserializing incoming data to events. The sink acts as client towards other systems and manages those connections.

With the addition of linked transports, onramps gained the ability to have response-event send back to them, so they could handle request-response-style protocols like HTTP. Its implementation got a little bit messy.
The abstraction of an entity maintaining connections to outside systems and allowing incoming and outgoing events should be called a peer (or any other name we come up with).

We introduce a new artefact called peer. It has a type, some type specific config, codec and interceptor (pre- or post-processors) properties. In mappings it can behave both like sink and source.

Example config (suggestion):

peer:
  - id: ws-client
    type: ws
    codec: json
    interceptors:
      - lines:
        delimiter: "\n"
        buffer: 1024
    config:
      endpoint:
        host: 123.45.67.89
        port: 8081

Metrics Collection

Metrics counters for peers:

  • in: pipeline -> peer
  • out: peer -> pipeline
  • err: errors
  • send: peer -> external
  • recv: external -> peer

Naming

Naming suggestions:

  • Connection
  • Peer

Streams

Motivation

For linked transports, that is sending events back to the same context they originated from as a response, we need to have a way of routing those "response" events to the correct context where they belong (e.g. HTTP session, Websocket connection, ...). For this a peer needs to put the information of the originating context into the event and this information needs to be maintained inside all events that this originating events might cause (e.g. across request/response handling of another peer).

This identification can be assembled from: peer id + stream id + stream-specific peer event id.
Here stream can refer to any logical grouping of incoming data (e.g. an HTTP session, a TCP connection), that is necessary for routing response events to their right (peer specific) context.

Also, we already have streams in sources, where the creation of a new stream creates a new chain of preprocessors, as they might keep some state around that should not be shared between streams/connections. Unfortunately we do not make use of streams anywhere else e.g. in sinks.

The actual thing

  • Event ids get extended to be composed of peer id, stream id and (stream-specific) event id.
  • We introduce new stream-lifecycle signals whenever a new stream is created or closed within a peer or elsewhere. E.g. upon stream creation a StartStream(stream_id) signal is sent to all pipelines connected via the source nature of the peer.
    • We will define and document how peers react on those messages. (e.g. pre-create connections when a StartStream message is received)
  • a separate set of interceptor states is maintained for each stream inside a peer
  • We extend circuit breaker events to be either scoped to a single stream or towards the whole flow of events.
    • Whenever a new stream is created and ready we send out a CB Open Stream(stream_id contraflow event, similar to how we do now for whole sinks.
                   server                           client
                     |                                 |
client connects ---> |                                 |
                     | ---------start stream---------> |
                     |                              connect ----> server
                     | <--------CB open stream-------- |
                     |                                 |
client event ------> | -----------event--------------> | ---------> server
                     |                                 |
                     | <----------ack/fail------------ |
client response <--- |                                 |
  • To not break the circuit breaker mechanism we ban dynamic endpoint selection via event metadata. endpoints in sinks must be declared ahead of time. Breaking Change
  • stream id, peer id and event id of an event can be accessed via tremor script
  • We remove the batch operator as it ignores streams, or we introduce a version that creates batches per stream
    • trickle batch examples:

      define tumbling window `15secs1kmsg`
      with
         interval = datetime::with_seconds(15),
         count = 1000
      end;
      
      # batch by stream
      select win::collect_flattened()
      from in[`15secs1kmsg`] # We apply the window nominally to streams
      group by [tremor::source_id(), tremor::stream_id()]
      into out;
      
      # batch across all streams
      select win::collect_flattened()
      from in[`15secs1kmsg`] # We apply the window nominally to streams
      into out;
      
  • We document a stream as a central concept of event flow from source through pipelines to sink, encapsulating logical subsections of an event-flow determined by
@darach darach added enhancement New feature or request open-review This RFC is undergoing open / public review with intent to proceed labels Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request open-review This RFC is undergoing open / public review with intent to proceed
Projects
None yet
Development

No branches or pull requests

2 participants