Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with metrics exporter in otelcol.exporter.loadbalancing #5684

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

ptodev
Copy link
Contributor

@ptodev ptodev commented Nov 1, 2023

PR Description

In #5529, we upgraded Agent from OTel 0.85 to 0.87. One of the recent changes in OTel is the ability of the LB exporter to work with metrics. Unfortunately, this also means that its metrics exporter rejects configs which use the "traceID" routing key. Due to the way otelcol components are intialised, we always create all three exporters (logs, metrics, traces) for each otelcol exporter.

Agent does not need to support the metrics exporter in the LB exporter. That's because metrics exporting is experimental, and otelcol.exporter.loadbalancing is labeled in Flow as beta. For this reason metrics routing keys were explicitly disabled during #5529, and the ability to export metrics was not documented in the Flow docs for otelcol.exporter.loadbalancing.

Notes to the Reviewer

This issue was reported on the community Slack.

This PR is a temporary workaround so that otelcol.exporter.loadbalancing will work again with a routing key of "traceID". We will need to find a permanent solution to this problem soon. And remove this workaround.

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@ptodev ptodev requested a review from a team as a code owner November 1, 2023 18:40
@rfratto
Copy link
Member

rfratto commented Nov 1, 2023

I understand that this is to stop the bleeding, so I think this is fine.

However, I don't understand the underlying problem here, and why it's not an issue for OpenTelemetry Collector.

Are they using different default configs for different telemetry types? Do we need to copy the same pattern?

@ptodev
Copy link
Contributor Author

ptodev commented Nov 1, 2023

FYI, I tested this locally using the k6 traces creator and this config:

otelcol.receiver.otlp "from_k6" {
    grpc {
        endpoint = "0.0.0.0:4320"
    }

    output {
    traces  = [otelcol.exporter.loadbalancing.default.input]
    }
}

otelcol.exporter.loadbalancing "default" {
  routing_key = "traceID"
    resolver {
        static {
            hostnames = ["localhost:55690"]
        }
    }
    protocol {
        otlp {
            client {
              tls {
                insecure = true
              }
            }
        }
    }
}

otelcol.receiver.otlp "from_lb" {
    grpc {
        endpoint = "0.0.0.0:55690"
    }

    output {
    traces  = [otelcol.exporter.otlp.grafana_cloud_tempo.input]
    }
}


otelcol.exporter.otlp "grafana_cloud_tempo" {
    client {
        endpoint = "tempo-prod-06-prod-gb-south-0.grafana.net:443"
        auth     = otelcol.auth.basic.grafana_cloud_tempo.handler
    }
}

otelcol.auth.basic "grafana_cloud_tempo" {
    username = ""
    password = ""
}

@@ -69,7 +71,7 @@ var (
//
// The registered component must be registered to export the
// otelcol.ConsumerExports type, otherwise New will panic.
func New(opts component.Options, f otelexporter.Factory, args Arguments) (*Exporter, error) {
func New(opts component.Options, f otelexporter.Factory, args Arguments, disableMetricsExporter bool) (*Exporter, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a disableMetricsExporter parameter feels a little too targeted for this specific bug. Since the exporter package is not where the bug originates, I think it can make adding this specific parameter a big confusing and vague.

I'd like to see this fix transform into a more general improvement to the exporter package's API, such as some kind of bitfield for which telemetry types to create exporters for, such as calling:

  • exporter.New(opts, fact, args.(Arguments), exporter.TypesAll), or
  • exporter.New(opts, fact, args.(Arguments), exporter.TypeLogs|exporter.TypeTraces).

This will be more work, but I think it leaves the API design feeling more intentional rather than having the ability to disable specifically metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm hoping we just get rid of this workaround soon altogether. We shouldn't be initialising a metrics exporter if other components are not configured to send metrics to the LB exporter. I think that's the real problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bitfield just now, and also tested it locally by running the agent with an LB exporter and an otlp one. PTAL.

@ptodev
Copy link
Contributor Author

ptodev commented Nov 1, 2023

However, I don't understand the underlying problem here, and why it's not an issue for OpenTelemetry Collector.

Are they using different default configs for different telemetry types? Do we need to copy the same pattern?

I think Collector would only create a metrics exporter if it's explicitly requested. I think a metrics exporter for an LB would only be created it it's in a metrics pipeline. There's an example of a traces pipeline in the otel docs.

We had to make changes to component/otelcol/processor/processor.go to do checks like if len(next.Traces) > 0 for a similar reason as well.

@tpaschalis tpaschalis requested a review from a team as a code owner November 6, 2023 11:45
@ptodev ptodev force-pushed the ptodev/fix-lb-metrics branch from 329ff93 to 56ad725 Compare November 15, 2023 11:14
@ptodev ptodev force-pushed the ptodev/fix-lb-metrics branch from 56ad725 to 4146f63 Compare November 15, 2023 11:17
@ptodev ptodev requested a review from rfratto November 15, 2023 11:18
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rfratto rfratto merged commit a2817f5 into main Nov 15, 2023
10 checks passed
@rfratto rfratto deleted the ptodev/fix-lb-metrics branch November 15, 2023 19:38
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants