Skip to content

Commit

Permalink
update to latest template
Browse files Browse the repository at this point in the history
  • Loading branch information
dashpole committed Feb 7, 2025
1 parent f2dbbce commit 810ca4b
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 29 deletions.
61 changes: 34 additions & 27 deletions keps/sig-instrumentation/2831-kubelet-tracing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Requirements](#graduation-requirements)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
- [Does enabling the feature change any default behavior?](#does-enabling-the-feature-change-any-default-behavior)
Expand All @@ -47,14 +49,23 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [X] (R) KEP approvers have approved the KEP status as `implementable`
- [X] (R) Design details are appropriately documented
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- [X] e2e Tests for all Beta API Operations (endpoints)
- [X] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [X] (R) Minimum Two Week Window for GA e2e tests to prove flake free
- [X] (R) Graduation criteria is in place
- [X] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [X] (R) Production readiness review completed
- [X] Production readiness review approved
- [X] (R) Production readiness review approved
- [X] "Implementation History" section is up-to-date for milestone
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

This Kubernetes Enhancement Proposal (KEP) is to enhance the kubelet to allow tracing gRPC and HTTP API requests.
Expand Down Expand Up @@ -254,6 +265,14 @@ GA

- [X] Feedback from users collected and incorporated over multiple releases

### Upgrade / Downgrade Strategy

Tracing will work if the kubelet version supports the feature, and will not export spans if it doesn't. It does not impact the ability to upgrade or rollback kubelet versions.

### Version Skew Strategy

Version skew isn't applicable because this feature only involves the kubelet.

## Production Readiness Review Questionnaire

### Feature Enablement and Rollback
Expand Down Expand Up @@ -328,13 +347,13 @@ _This section must be completed when targeting beta graduation to a release._

### Monitoring Requirements

_This section must be completed when targeting beta graduation to a release._

###### How can an operator determine if the feature is in use by workloads?

Operators are expected to have access to and/or control of the OpenTelemetry agent deployment and trace storage backend.
KubeletConfiguration will show the FeatureGate and TracingConfiguration.

Workloads do not directly use this feature.

###### How can someone using this feature know that it is working for their instance?

The tracing backend will display the traces with a service "kubelet".
Expand All @@ -346,34 +365,20 @@ _This section must be completed when targeting beta graduation to a release._

##### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

- [] Metrics
- Metric name: tbd [opentelemetry-go issue #2547](https://github.com/open-telemetry/opentelemetry-go/issues/2547)
- Components exposing the metric: kubelet
None. Operators can use the absence of traces which an observability signal in their own right.

##### Are there any missing metrics that would be useful to have to improve observability
To be determined.

It would be helpful to have metrics about span generation and export: [opentelemetry-go issue #2547](https://github.com/open-telemetry/opentelemetry-go/issues/2547)

### Dependencies

_This section must be completed when targeting beta graduation to a release._

###### Does this feature depend on any specific services running in the cluster?**

Yes. In the current version of the proposal, users must run the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector)
as a daemonset and configure a backend trace visualization tool (jaeger, zipkin, etc).

as a daemonset and configure a backend trace visualization tool (jaeger, zipkin, etc). There are also a wide variety of vendors and cloud providers which support OTLP.

### Scalability

_For alpha, this section is encouraged: reviewers should consider these questions
and attempt to answer them._

_For beta, this section is required: reviewers must answer these questions._

_For GA, this section is required: approvers should be able to confirm the
previous answers based on experience in the field._

###### Will enabling / using this feature result in any new API calls?

This will not add any additional API calls.
Expand Down Expand Up @@ -403,28 +408,30 @@ previous answers based on experience in the field._

The tracing client library has a small, in-memory cache for outgoing spans.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

### Troubleshooting

The Troubleshooting section currently serves the `Playbook` role. We may consider
splitting it into a dedicated `Playbook` document (potentially with some monitoring
details). For now, we leave it here.

_This section must be completed when targeting beta graduation to a release._

###### How does this feature react if the API server and/or etcd is unavailable?

No reaction specific to this feature if API server and/or etcd is unavailable.

###### What are other known failure modes?

- [The controller is misconfigured and cannot talk to the collector or the collector cannot send traces to the backend]
- [The kubelet is misconfigured and cannot talk to the collector or the kubelet cannot send traces to the backend]
- Detection: How can it be detected via metrics? Stated another way:
how can an operator troubleshoot without logging into a master or worker node?
**kubelet logs, component logs, collector logs**
- Mitigations: **Disable KubeletTracing, update collector, backend configuration**
- Mitigations: **Fix the kubelet configuration, update collector, backend configuration**
- Diagnostics: What are the useful log messages and their required logging
levels that could help debug the issue? **go-opentelemetry sdk provides logs indicating failure**
- Testing: To be added.
- Testing: It isn't particularly useful to test misconfigurations.

## Implementation History

Expand All @@ -433,7 +440,7 @@ _This section must be completed when targeting beta graduation to a release._
- 2022-03-29: KEP deemed not ready for Alpha in 1.24
- 2022-06-09: KEP targeted at Alpha in 1.25
- 2023-01-09: KEP targeted at Beta in 1.27
- 2023-01-09: KEP targeted at Stable in 1.33
- 2025-02-07: KEP targeted at Stable in 1.33

## Drawbacks

Expand Down
2 changes: 0 additions & 2 deletions keps/sig-instrumentation/2831-kubelet-tracing/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,3 @@ feature-gates:
components:
- kubelet
disable-supported: true
metrics:
- "tbd"

0 comments on commit 810ca4b

Please sign in to comment.