You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we see failures in the CI for upcoming OpenShift 4.10 in KafkaSource upgrade tests. Each time we encounter data loss, we struggle to pinpoint the failure point. That's because the Eventing lacks a dedicated tooling that could help debug such situation.
Dedicated tooling should provide clear signal why the expected events haven't been delivered.
We expect to see the report with events that were dropped, and the place of such drop, with timing. Essentailly, a distributed tracing spans of missed events.
It should be easy to set up - "Track this namespace, for events of type XXX and YYY"
In order to embed within test-suites, and external tools like kn-trace.
It needs to support filtering by fields, especially source field
when running tests in parallel it is crucial to debug only specific event streams
Time Estimate (optional):
10d
Additional context (optional)
End users, who'd like to extend the Eventing with their own implementations, could also suffer from this issue. It would be good to have such tooling in that case.
Running more tests with continual traffic, like wathola tooling does, could uncover additional data loss bugs. In the future, it's possible to run some chaos/soak/failover scenarios with such assertions. The debugging tooling could be helpful in that case as well.
The text was updated successfully, but these errors were encountered:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Problem
We see the data loss in Eventing components from time to time, unfortunately. Here are some of notable events (knative-extensions/eventing-kafka#649, knative-extensions/eventing-kafka#549, #2357).
Currently, we see failures in the CI for upcoming OpenShift 4.10 in KafkaSource upgrade tests. Each time we encounter data loss, we struggle to pinpoint the failure point. That's because the Eventing lacks a dedicated tooling that could help debug such situation.
Persona:
Developer
Exit Criteria
Time Estimate (optional):
10d
Additional context (optional)
End users, who'd like to extend the Eventing with their own implementations, could also suffer from this issue. It would be good to have such tooling in that case.
Running more tests with continual traffic, like wathola tooling does, could uncover additional data loss bugs. In the future, it's possible to run some chaos/soak/failover scenarios with such assertions. The debugging tooling could be helpful in that case as well.
The text was updated successfully, but these errors were encountered: