Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectors Implementation #1077

Merged
merged 524 commits into from
Apr 12, 2022
Merged

Connectors Implementation #1077

merged 524 commits into from
Apr 12, 2022

Conversation

mfelsche
Copy link
Member

@mfelsche mfelsche commented Jun 15, 2021

Pull request

Description

Connectors are successors to onramps/offramps, sinks/sources. A connector can be a source of events and events can be sent to a connector. Not all connectors necessarily support all both the sink and source nature/trait.

This PR introduces a new config artefact
and tries to remodel the current onramp/offramp handling to move as much boilerplate and common logic from the connector implementation to the runtime (or where not possible to reasonable utilities), so that writing new connectors is getting easier, while maintaining all invariants enforced by the runtime.

Additional connector features:

  • Pause / Resume
  • Connect / Reconnect / connectivity probes
  • Configurable Codecs/Processors

TODOs

  • Create predefined reconnect strategies (e.g. randomized exponential backoff, ...)
  • correlation metadata in RPC style connectors (moved to connectors)
  • broadcast system metrics to all metrics connector instances
  • look into stdin connector if we want broadcasting to all instances as well
  • integration tests for verifying proper startup
  • integration tests for verifying pause/resume behaviour
  • rust in-process integration test suite for connectors (in tests folder)
    • getting a stream for events from out, err
    • getting a sender for sending events to in
    • verifying events send to and received from a connector
    • e.g. sending TCP data to it, asserting on received events
  • only register debug connector types (exit, debug, cb, benchmark) when a certain server flag is given (--register-debug-connectors or similar)
  • Binding Quiescence - assuming no overlaps or shared resources
  • Connectors API - status, change status (pause, resume, ...)
  • Flow API - add GET status, PUT status (pause, resume, ...) endpoints
  • Ensure that metrics are emitted correctly to the metrics connector, same for pipelines
  • Connector ports (tier 1; known to be in active use, will be prioritised for porting)
    • discord (h)
    • DNS (h)
      • correlation
    • elastic (confirmed) (m)
      • correlation
    • exit (m)
    • file (please not line-based this time, both sink and src) (m)
    • kafka-consumer (m)
    • kafka-producer (m)
    • kv (h)
      • correlation
    • metronome (h)
    • otel (src + sink, confirmed)
      • add echo client/server integration test
    • http (rest server + client, should this include SSE?, confirmed) (d)
      • correlation
      • basic auth
      • concurrency=1 in client issue with blocking
      • batch punctuation for better/correct batch event response handling
    • stdio (stdin + stdout + stderr)
    • tcp-client (m)
    • tcp-server (m)
    • udp-client (h)
    • udp-server (h)
    • wal (h)
    • unix-socket (confirmed)
  • Connector ports (tier 2; utility, testing & debugging, this will not be prioritized on porting form sink/src to connector )
    • benchmark connector (blaster/blackhole previously) (h)
    • cb (m)
    • debug
    • sse (could this be part of http?)
    • gpubsub (gsub + gpub, use cases known and expanding )
    • gcs (confirmed not used at the moment but future usecase exists)
  • Connector ports (tier 3; not known to be in use - low priority, but comment this issue if you are using any of these, dear readers )
    • websocket-client (near term use cases exist) (d)
    • websocket-server (near term use cases exist) (d)
    • postgres (src + sink, do we still need this?, not known to be in production use at this time )
    • nats (src + sink, not known to be in production use at this time )
    • newrelic ( community contributed by grover, not known if it is in production use at this time )
    • crononome (ported, not in production use that we can confirm)
    • amqp (confirmed not used)

Test expectations and conditions

Tier 1 and Tier 2 connectors should have container tests in /tests contributed with the new connector test
harness ( this helps greatly with code coverage ) and integration tests in /tremor-cli/tests/integration that
demonstrate canonical and simple use and validation of the connector in all its configuration modes ( client or server, secure and insecure etc.. ). For Tier 3, container and integration tests are preferable.

Related

Checklist

  • The RFC, if required, has been submitted and approved
  • Any user-facing impact of the changes is reflected in docs.tremor.rs
  • The code is tested
  • Use of unsafe code is reasoned about in a comment
  • Update CHANGELOG.md appropriately, recording any changes, bug fixes, or other observable changes in behavior
  • The performance impact of the change is measured (see below)

Performance

src/config.rs Outdated Show resolved Hide resolved
src/connectors/sink.rs Outdated Show resolved Hide resolved
src/connectors/source.rs Outdated Show resolved Hide resolved
tremor-common/src/ids.rs Outdated Show resolved Hide resolved
@mfelsche mfelsche force-pushed the connectors branch 2 times, most recently from 89b779c to 093ff41 Compare July 15, 2021 14:05
src/version.rs Outdated Show resolved Hide resolved
@mfelsche mfelsche force-pushed the connectors branch 2 times, most recently from 51d7a6e to 2885548 Compare August 18, 2021 21:22
src/lib.rs Outdated Show resolved Hide resolved
@Licenser
Copy link
Member

Licenser commented Sep 3, 2021

A question to the std_stream connector, I noticed that the user needs to specify what type of std_stream it is which in my mind breaks the connector concept a bit since instead of a bidirectional connection it is now a unidirectional connection that the user specifies the direction to.

Would it make sense to turn the std_stream into a connector that allows reading/writing to? behaving more like stdio?

This would require connectors to be aware when their input (sink) and output (source) ports are connected to dynamically shut down / start the corresponding subsystems but it would feel a bit more intuitive than having to write:

connector:
  - id: stdin
    type: std_stream
    config:
      stream: stdin
  - id: stdout
    type: std_stream
    config:
      stream: stdout
  - id: stderr
    type: std_stream
    config:
      stream: stderr

Same goes for Kafka, so kafka should it be a consumer/producer connector? that works bidirectional depending on how things are connected? Or at least follow the same model: Either each direction has their own connector, or there is one connector with multiple directions based on configuration?

For other connectors like http-server and http-client (TCP, UDP too) the separation makes sense ( since both are bidirectional) but I think we got to be a bit careful on naming them to keep it consistent. Perhaps we should retrofit the RFC to spell out the notions of where we split out client /server connectors, where we combine http-client, where we specify by configuration (currently stdio) or by connector (currently kafka) that we get a more generalized and pattern driven approach?

@Licenser Licenser force-pushed the connectors branch 2 times, most recently from 8bc5a2b to 0ec38c4 Compare September 3, 2021 09:28
@darach
Copy link
Member

darach commented Sep 3, 2021

Once this branch goes green it would be worth merging the troy branch once both have rebased against main. Progressing the deployment language work now depends on having connectors available in the runtime.

src/connectors/metrics.rs Outdated Show resolved Hide resolved
Licenser
Licenser previously approved these changes Oct 5, 2021
Copy link
Member

@Licenser Licenser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 also fixes #587

Licenser and others added 19 commits April 12, 2022 17:36
Signed-off-by: Heinz N. Gies <[email protected]>
Signed-off-by: Heinz N. Gies <[email protected]>
To wait a bit so the outfile can be written.

Signed-off-by: Matthias Wahl <[email protected]>
Signed-off-by: Matthias Wahl <[email protected]>
Signed-off-by: Heinz N. Gies <[email protected]>
Signed-off-by: Heinz N. Gies <[email protected]>
Signed-off-by: Matthias Wahl <[email protected]>
Signed-off-by: Heinz N. Gies <[email protected]>
Signed-off-by: Heinz N. Gies <[email protected]>
@Licenser Licenser dismissed stale reviews from darach and themself via 4b50d0f April 12, 2022 15:36
Licenser
Licenser previously approved these changes Apr 12, 2022
Signed-off-by: Heinz N. Gies <[email protected]>
@Licenser Licenser merged commit 8361e2f into main Apr 12, 2022
@Licenser Licenser deleted the connectors branch April 12, 2022 16:09
@marioortizmanero marioortizmanero mentioned this pull request May 2, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

stats::dds does not include sum
5 participants