Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs/component-stability.md] Add criteria for graduating between stability levels #11864

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mx-psi
Copy link
Member

@mx-psi mx-psi commented Dec 12, 2024

Description

Code ownership and maintenance of components continues to be an issue, with varying levels of support across contrib. As we approach 1.0 and the ability to mark components as stable, we want to make sure that components that we deem as 'stable' have a healthy community around them. We have three datapoints that we can leverage here: how many codeowners a component has, how diverse these are in terms of employers and how actively the codeowners have been responding to issues/PRs in the recent past.

We need criteria that

  1. Are reasonable predictors of the component health over the short/medium term
  2. Are not too onerous on the code owners

Some notes:

  1. Some beta components do not meet the criteria listed on the PR. This will be the case even after the transition for some components. This PR makes no claim as to what should happen to these components stability (so, de facto, they will stay as is).
  2. The OTLP receiver and exporters do not meet this criteria today because they don't have listed code owners. We can solve this either by carving out an exception or by listing code owners.
  3. We need automation and templates to enforce this.

Link to tracking issue

Fixes #11850

@mx-psi mx-psi added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Dec 12, 2024
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.62%. Comparing base (50104db) to head (d551fa7).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #11864   +/-   ##
=======================================
  Coverage   91.62%   91.62%           
=======================================
  Files         447      447           
  Lines       23731    23731           
=======================================
  Hits        21743    21743           
  Misses       1613     1613           
  Partials      375      375           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

## Beta to stable

To graduate any signal from beta to stable on a component:
1. The component MUST have at least three active code owners.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a commitment from codeowners that there is a SLA for first response on bug issues.
The commitment should be measured in days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises some questions for me including:

  • What happens when people go on vacation/have a kid/[insert activity here that leads to a prolonged period of absence]?
  • What happens if people don't follow this SLA? Typically an SLA means that you pay if you don't meet a certain standard, how do you "pay" here?

Copy link
Member

@julianocosta89 julianocosta89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't make it too hard to have community components.
I think vendor components and widely used components will not have any issue to follow the guidelines.

When we think about components that add value to the overall project, but may not be interesting/priority to vendors, they may struggle to get folks involved on it, and that would disqualify them from being moved to stable.

I do understand that we need to provide a way to ensure maintainability of stable components, but maybe we could draw something over the ideas of:

  • Being active
  • Replying to issues related to its components timely
  • Fixing reported bug timely
  • ...

If we have a component that is not vendor related, but maintained by 2 folks from a single employer, that wouldn't allow them to move on.

Also, let's imagine the following scenario:

  • 3 folks are codeowners of a component, 2 from one company and another one from another company.
  • They graduate to stable.
  • A couple of months later the person that was from the other company moves to the same company of the other 2. Would the component be demoted?

I don't think it should be, if they are active and responsive in issues related to that component.
I know it is a corner case, and may never happen, but it still can.

My main point here, is actually that we shouldn't make it too hard to have community components.
I know a couple of companies that develop internal components to solve their customers' issues. It would be awesome to have a couple of those contributed back to upstream, and let the community grow together.

@mx-psi
Copy link
Member Author

mx-psi commented Dec 17, 2024

@julianocosta89

When we think about components that add value to the overall project, but may not be interesting/priority to vendors, they may struggle to get folks involved on it, and that would disqualify them from being moved to stable.

There is a trade-off between having more components and having fewer components that are more actively maintained. We need to be mindful of where we draw the line, but my feeling is that right now we have too many components that are not well maintained.

I do understand that we need to provide a way to ensure maintainability of stable components, but maybe we could draw something over the ideas of:

  • Being active
  • Replying to issues related to its components timely
  • Fixing reported bug timely
  • ...

There's two questions to consider here:

  1. When do we make a decision to move a component to stable?
  2. When do we make a decision to move a component to unmaintained?

On this PR I am focusing on (1). For doing (1), we need to focus on things that we can measure/check at the time of marking as stable. Some of the things you mention are, I feel like, important criteria for deciding if a component should be moved to unmaintained, but not to move to stable.

If we have a component that is not vendor related, but maintained by 2 folks from a single employer, that wouldn't allow them to move on.

The point with the 'vendor diversity' is that I think it is a good predictor of component quality (more than one vendor means more focus on a wide number of use cases) and maintainability (we don't depend on a single company). Maybe it would help to do an analysis of existing components to see how difficult this is to achieve?

Also, let's imagine the following scenario:

  • 3 folks are codeowners of a component, 2 from one company and another one from another company.
  • They graduate to stable.
  • A couple of months later the person that was from the other company moves to the same company of the other 2. Would the component be demoted?

This PR makes no claims about when a component should be 'demoted'. Currently the only way to be demoted is to be moved to unmaintained, we have some rules about when that can happen. I personally don't think we should move from stable->beta, I think that would be confusing for end users.

The way I see this it is a bit like the CNCF project status: there is no moving from graduated to incubating, only from graduated to deprecated.

@mx-psi
Copy link
Member Author

mx-psi commented Dec 17, 2024

I split off part of this PR in #11937. PTAL at that one first

@julianocosta89
Copy link
Member

There is a trade-off between having more components and having fewer components that are more actively maintained. We need to be mindful of where we draw the line, but my feeling is that right now we have too many components that are not well maintained.

Agree.

For doing (1), we need to focus on things that we can measure/check at the time of marking as stable. Some of the things you mention are, I feel like, important criteria for deciding if a component should be moved to unmaintained, but not to move to stable.

Makes sense.

Maybe it would help to do an analysis of existing components to see how difficult this is to achieve?

Let's take the Connector first, from all components none of them have 3 ACTIVE codeowners. sumconnector is the only component with 3 codeowners, but the 3 of them do not seem much active

  • Connectors:
    • countconnector: 2 codeowners from different vendors
    • exceptionsconnector: 1 codeowner
    • failoverconnector: 2 codeowners (from different vendors?)
    • otlpjsonconnector: 2 codeowners from different vendors
    • roundrobinconnector: 1 codeowner
    • routingconnector: 2 codeowners from different vendors
    • servicegraphconnector: 2 codeowners from different vendors
    • signaltometricsconnector: 2 codeowners (from different vendors?)
    • sumconnector: 3 codeowners (from different vendors?)

On the other hand, I think this "rule" would bring more awareness to the components, and maybe it would also bring more codeowners to "important" components.

Still not sure about the the amount of codeowners.
Most of the connectors (and had a quick look at processors) have just 1 or 2.
I agree 1 is not enough, but wouldn't 2 (from different companies) be enough?

@julianocosta89
Copy link
Member

Still not sure about the the amount of codeowners. Most of the connectors (and had a quick look at processors) have just 1 or 2. I agree 1 is not enough, but wouldn't 2 (from different companies) be enough?

After thinking further and discussing with @mx-psi, I believe this criteria is going to be the foundation for users to further engage and assume codeowner's responsibility in the components they would like to move to stable.
I'd say that at least 3 codeowners is a good number too keep the collector components maintainable are not just pilling up responsibilities to Collector maintainers.

github-merge-queue bot pushed a commit that referenced this pull request Dec 19, 2024
…levels' section (#11937)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Split off from #11864, describes how the graduation would work without
any additional criteria.

Rendered diagram:


```mermaid
stateDiagram-v2
    state Maintained {
    InDevelopment --> Alpha
    Alpha --> Beta
    Beta --> Stable
    }
    InDevelopment: In Development
    Maintained --> Unmaintained
    Unmaintained --> Maintained
    Maintained --> Deprecated
    Deprecated --> Maintained: (should be rare)
```

---------

Co-authored-by: Christos Markou <[email protected]>
@mx-psi mx-psi marked this pull request as ready for review December 19, 2024 12:14
@mx-psi mx-psi requested a review from a team as a code owner December 19, 2024 12:14
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this pull request Dec 19, 2024
…levels' section (open-telemetry#11937)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Split off from open-telemetry#11864, describes how the graduation would work without
any additional criteria.

Rendered diagram:


```mermaid
stateDiagram-v2
    state Maintained {
    InDevelopment --> Alpha
    Alpha --> Beta
    Beta --> Stable
    }
    InDevelopment: In Development
    Maintained --> Unmaintained
    Unmaintained --> Maintained
    Maintained --> Deprecated
    Deprecated --> Maintained: (should be rare)
```

---------

Co-authored-by: Christos Markou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Establish codeowners minimum criteria for moving up through the stability ladder
4 participants