You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have an otel-trace slog sender which creates open telemetry traces by inferring relations between slog events. However it is fragile (#10405) and the heuristics require storing out of band information in a sqlite DB. The shape of traces is also not super conductive to investigations when trying to debug executions.
We also have a separate causeway tool that does similar processing on a slog file.
Description of the Design
I believe that open telemetry traces is still a decent format to represent causal execution, most likely if we can properly use multiple links between traces.
The best would be to stop relying on result promise kpid to match calls to deliveries, but that requires kernel changes tracked in #6501. At the very least we should try to make the association stateless so we don't need a large DB of pending calls (or if we do we need to make sure we cleanup these associations when possible)
The most important is to switch from a trace relation where each delivery in linked to the temporal previous one, and instead to a causal one where a notify delivery is primary linked to a subscribe syscall (and secondary linked to a resolve syscall), and a send delivery is linked to a send syscall.
What is the Problem Being Solved?
We have an
otel-trace
slog sender which creates open telemetry traces by inferring relations between slog events. However it is fragile (#10405) and the heuristics require storing out of band information in a sqlite DB. The shape of traces is also not super conductive to investigations when trying to debug executions.We also have a separate causeway tool that does similar processing on a slog file.
Description of the Design
I believe that open telemetry traces is still a decent format to represent causal execution, most likely if we can properly use multiple links between traces.
The best would be to stop relying on result promise kpid to match calls to deliveries, but that requires kernel changes tracked in #6501. At the very least we should try to make the association stateless so we don't need a large DB of pending calls (or if we do we need to make sure we cleanup these associations when possible)
The most important is to switch from a trace relation where each delivery in linked to the temporal previous one, and instead to a causal one where a notify delivery is primary linked to a subscribe syscall (and secondary linked to a resolve syscall), and a send delivery is linked to a send syscall.
I started down that path while working on #5724 and documented some ideas in mhofman/5724-otel-refactor. I think I may still have some related attempts somewhere.
Security Considerations
None, telemetry
Scaling Considerations
Avoid large tables or complex lookups for correlation
Test Plan
Use mainnet slogs and ingest script.
The text was updated successfully, but these errors were encountered: