Peers and Streams #30

mfelsche · 2020-11-02T20:40:05Z

Peer

A peer is an entity/artefact that maintains connections to the outside world, to other computer systems. It also defines, how data from the outside is deserialized into an event and how events are serialized back to be sent out.

An example would be a websocket server, who maintains a listening socket for clients to connect to and maintains client connections.

We already have this kind of thing in every source and sink that is backed by a connection-oriented protocol like TCP. The source maintains client connections and handles deserializing incoming data to events. The sink acts as client towards other systems and manages those connections.

With the addition of linked transports, onramps gained the ability to have response-event send back to them, so they could handle request-response-style protocols like HTTP. Its implementation got a little bit messy.
The abstraction of an entity maintaining connections to outside systems and allowing incoming and outgoing events should be called a peer (or any other name we come up with).

We introduce a new artefact called peer. It has a type, some type specific config, codec and interceptor (pre- or post-processors) properties. In mappings it can behave both like sink and source.

Example config (suggestion):

peer:
  - id: ws-client
    type: ws
    codec: json
    interceptors:
      - lines:
        delimiter: "\n"
        buffer: 1024
    config:
      endpoint:
        host: 123.45.67.89
        port: 8081

Metrics Collection

Metrics counters for peers:

in: pipeline -> peer
out: peer -> pipeline
err: errors
send: peer -> external
recv: external -> peer

Naming

Naming suggestions:

Connection
Peer

Streams

Motivation

For linked transports, that is sending events back to the same context they originated from as a response, we need to have a way of routing those "response" events to the correct context where they belong (e.g. HTTP session, Websocket connection, ...). For this a peer needs to put the information of the originating context into the event and this information needs to be maintained inside all events that this originating events might cause (e.g. across request/response handling of another peer).

This identification can be assembled from: peer id + stream id + stream-specific peer event id.
Here stream can refer to any logical grouping of incoming data (e.g. an HTTP session, a TCP connection), that is necessary for routing response events to their right (peer specific) context.

Also, we already have streams in sources, where the creation of a new stream creates a new chain of preprocessors, as they might keep some state around that should not be shared between streams/connections. Unfortunately we do not make use of streams anywhere else e.g. in sinks.

The actual thing

Event ids get extended to be composed of peer id, stream id and (stream-specific) event id.
We introduce new stream-lifecycle signals whenever a new stream is created or closed within a peer or elsewhere. E.g. upon stream creation a StartStream(stream_id) signal is sent to all pipelines connected via the source nature of the peer.
- We will define and document how peers react on those messages. (e.g. pre-create connections when a StartStream message is received)
a separate set of interceptor states is maintained for each stream inside a peer
We extend circuit breaker events to be either scoped to a single stream or towards the whole flow of events.
- Whenever a new stream is created and ready we send out a CB Open Stream(stream_id contraflow event, similar to how we do now for whole sinks.

                   server                           client
                     |                                 |
client connects ---> |                                 |
                     | ---------start stream---------> |
                     |                              connect ----> server
                     | <--------CB open stream-------- |
                     |                                 |
client event ------> | -----------event--------------> | ---------> server
                     |                                 |
                     | <----------ack/fail------------ |
client response <--- |                                 |

To not break the circuit breaker mechanism we ban dynamic endpoint selection via event metadata. endpoints in sinks must be declared ahead of time. Breaking Change
stream id, peer id and event id of an event can be accessed via tremor script

We remove the batch operator as it ignores streams, or we introduce a version that creates batches per stream

trickle batch examples:

define tumbling window `15secs1kmsg`
with
   interval = datetime::with_seconds(15),
   count = 1000
end;

# batch by stream
select win::collect_flattened()
from in[`15secs1kmsg`] # We apply the window nominally to streams
group by [tremor::source_id(), tremor::stream_id()]
into out;

# batch across all streams
select win::collect_flattened()
from in[`15secs1kmsg`] # We apply the window nominally to streams
into out;

We document a stream as a central concept of event flow from source through pipelines to sink, encapsulating logical subsections of an event-flow determined by

The text was updated successfully, but these errors were encountered:

darach added enhancement New feature or request open-review This RFC is undergoing open / public review with intent to proceed labels Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peers and Streams #30

Peers and Streams #30

mfelsche commented Nov 2, 2020 •

edited

Loading

Peers and Streams #30

Peers and Streams #30

Comments

mfelsche commented Nov 2, 2020 • edited Loading

Peer

Metrics Collection

Naming

Streams

Motivation

The actual thing

mfelsche commented Nov 2, 2020 •

edited

Loading