Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(traces): OTeL Traces implementation(duty flow) #1980

Open
wants to merge 10 commits into
base: stage
Choose a base branch
from

Conversation

oleg-ssvlabs
Copy link
Contributor

@oleg-ssvlabs oleg-ssvlabs commented Jan 14, 2025

Description/Questions/Suggestions

  • Are both logs and events needed for recording of the same type of event? Should it be either-or? Example.

  • Is Duty ID as a Trace Attribute necessary? Currently bothSlot and Epoch attributes are added separately. Would it be better to just add a Committee attribute instead?

  • Duty flows are separated into Committee versus everything else (method/function level). This is reflected in different span names. Should we keep separating them on a span level, or should we use the same names with attributes that help differentiate duties (like ssv.runner.role, etc)
    Span name examples:

    • ssv.validator.execute_committee_duty
    • ssv.validator.execute_duty
    • ssv.validator.start_committee_duty
    • ssv.validator.start_duty
  • There are three statuses for spans: Ok, Error, and Unset. Ideally, all spans should have their status explicitly set to either Ok or ErrorUnset is not expected. Please use Grafana UI and verify if we receive any of the spans in Unset status.

  • Look into namespaces for metrics, traces, and attributes. Namespaces should be consistent across all observability "primitives". There is a chance we have some inconsistencies
    Example: ssv.validator.duty vs. ssv.duty. If Duty belongs to Validator, use ssv.validator.duty everywhere.

  • OpenTelemetry Specification explains why OK status is used without a message. ("Description MUST be IGNORED for StatusCode Ok & Unset values."). Even if the message is set for OK statuses, it will be ignored by OTeL and not displayed in Grafana (yeah, SDKs could have been better here)

  • Some libraries needs to be updated for proper context propagation, especially for methods that perform I/O (e.g., HTTP calls). Example: p2p.Broadcast().

  • Some enums in the libraries used by SSV Node lack a String() method, which complicates logging and tracing.
    Example: types.PartialSigMsgType (ssv-spec lib). Something that should potentially be implemented by these libraries (we own source code)

  • Should these enums be moved to the spec types package instead?

@oleg-ssvlabs oleg-ssvlabs changed the title feat(traces): OTeL Traces implementation feat(traces): OTeL Traces implementation(duty flow) Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 30.59701% with 186 lines in your changes missing coverage. Please review.

Project coverage is 46.8%. Comparing base (1d90a85) to head (34dc244).
Report is 2 commits behind head on stage.

Files with missing lines Patch % Lines
observability/attributes.go 0.0% 80 Missing ⚠️
operator/validator/controller.go 0.0% 49 Missing ⚠️
observability/observability.go 0.0% 35 Missing ⚠️
operator/duties/scheduler.go 58.6% 12 Missing ⚠️
observability/option.go 0.0% 7 Missing ⚠️
cli/operator/node.go 0.0% 3 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@oleg-ssvlabs oleg-ssvlabs force-pushed the traces branch 6 times, most recently from 34dc244 to 819f619 Compare January 16, 2025 10:25
@oleg-ssvlabs oleg-ssvlabs marked this pull request as ready for review January 16, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant