Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add livedebugging support for prometheus.scrape #2298

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Main (unreleased)
- Add perf_schema quantile columns to collector

- Live Debugging button should appear in UI only for supported components (@ravishankar15)
- Add livedebugging support for `prometheus.scrape` (@ravishankar15)
- Add three new stdlib functions to_base64, from_URLbase64 and to_URLbase64 (@ravishankar15)
- Add `ignore_older_than` option for local.file_match (@ravishankar15)
- Add livedebugging support for `discover.relabel` (@ravishankar15)
Expand Down
1 change: 1 addition & 0 deletions docs/sources/troubleshoot/debug.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ Supported components:
* `prometheus.relabel`
{{< /admonition >}}
* `discovery.relabel`
* `prometheus.scrape`

## Debug using the UI

Expand Down
41 changes: 40 additions & 1 deletion internal/component/prometheus/scrape/scrape.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ import (
"github.com/grafana/alloy/internal/service/cluster"
"github.com/grafana/alloy/internal/service/http"
"github.com/grafana/alloy/internal/service/labelstore"
"github.com/grafana/alloy/internal/service/livedebugging"
"github.com/grafana/alloy/internal/useragent"
"github.com/grafana/alloy/internal/util"
)
Expand Down Expand Up @@ -188,10 +189,13 @@ type Component struct {

dtMutex sync.Mutex
distributedTargets *discovery.DistributedTargets

debugDataPublisher livedebugging.DebugDataPublisher
}

var (
_ component.Component = (*Component)(nil)
_ component.Component = (*Component)(nil)
_ component.LiveDebugging = (*Component)(nil)
)

// New creates a new prometheus.scrape component.
Expand Down Expand Up @@ -244,6 +248,11 @@ func New(o component.Options, args Arguments) (*Component, error) {
return nil, err
}

debugDataPublisher, err := o.GetServiceData(livedebugging.ServiceName)
if err != nil {
return nil, err
}

c := &Component{
opts: o,
cluster: clusterData,
Expand All @@ -253,6 +262,7 @@ func New(o component.Options, args Arguments) (*Component, error) {
targetsGauge: targetsGauge,
movedTargetsCounter: movedTargetsCounter,
unregisterer: unregisterer,
debugDataPublisher: debugDataPublisher.(livedebugging.DebugDataPublisher),
}

// Call to Update() to set the receivers and targets once at the start.
Expand Down Expand Up @@ -324,6 +334,7 @@ func (c *Component) distributeTargets(
var (
newDistTargets = discovery.NewDistributedTargets(args.Clustering.Enabled, c.cluster, targets)
oldDistributedTargets *discovery.DistributedTargets
componentID = livedebugging.ComponentID(c.opts.ID)
)

c.dtMutex.Lock()
Expand All @@ -341,6 +352,32 @@ func (c *Component) distributeTargets(
// by the scrape loop itself during the sync.
promMovedTargets := c.populatePromLabels(movedTargets, jobName, args)

if c.debugDataPublisher.IsActive(componentID) {
var (
oldTargetLabels labels.Labels
newTargetLabels labels.Labels
movedTargetLabels labels.Labels
)
for _, t := range oldDistributedTargets.LocalTargets() {
oldTargetLabels = append(oldTargetLabels, t.Labels().Copy()...)
}

for _, t := range newLocalTargets {
newTargetLabels = append(newTargetLabels, t.Labels().Copy()...)
}

for _, t := range movedTargets {
movedTargetLabels = append(movedTargetLabels, t.Labels().Copy()...)
}

data := fmt.Sprintf("oldTargetLabels: %s => newTargetLabels: %s => movedTargetLabels: %s",
oldTargetLabels.String(),
newTargetLabels.String(),
movedTargetLabels.String(),
)
c.debugDataPublisher.Publish(componentID, data)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the scrape component it would be more interesting to get the metrics than the targets. I solved this on an experimental branch by using a Prometheus interceptor: https://github.com/grafana/alloy/blob/hackathon-alloy-live-graph/internal/component/prometheus/scrape/scrape.go#L266
I don't know if that's the best approach or not, it's crucial that it does not impact the performances when the debugDataPublisher is not active

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interceptor sounds good. It acts a proxy function call and without debug enabled the publish is not going to happen. Also I see that the idea of interceptor is used in remote_write as well.

The other way is to hook up the interceptor only if the debugging is Active but it would make the code complicated as the Update function only updates the config and we have scraper manager which runs independently of update.

I feel the interceptor sounds good. Do you have other thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna run the interceptor version in a dev infra that uses prometheus scrape heavily to see if there are any performance problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure ! If it helps I can make the interceptor changes to this branch for convenience. Shall add that commit ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks but you can leave it as it is for now, I will keep you updated on the tests


return promNewTargets, promMovedTargets
}

Expand Down Expand Up @@ -387,6 +424,8 @@ func (c *Component) NotifyClusterChange() {
}
}

func (c *Component) LiveDebugging(_ int) {}

// Helper function to bridge the in-house configuration with the Prometheus
// scrape_config.
// As explained in the Config struct, the following fields are purposefully
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"github.com/grafana/alloy/internal/service/cluster"
"github.com/grafana/alloy/internal/service/http"
"github.com/grafana/alloy/internal/service/labelstore"
"github.com/grafana/alloy/internal/service/livedebugging"
"github.com/grafana/alloy/internal/util"
"github.com/grafana/alloy/internal/util/assertmetrics"
"github.com/grafana/alloy/internal/util/testappender"
Expand Down Expand Up @@ -213,6 +214,8 @@ func testOptions(t *testing.T, alloyMetricsReg *client.Registry, fakeCluster *fa
return fakeCluster, nil
case labelstore.ServiceName:
return labelstore.New(nil, alloyMetricsReg), nil
case livedebugging.ServiceName:
return livedebugging.NewLiveDebugging(), nil
default:
return nil, fmt.Errorf("service %q does not exist", name)
}
Expand Down
5 changes: 5 additions & 0 deletions internal/component/prometheus/scrape/scrape_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"github.com/grafana/alloy/internal/service/cluster"
http_service "github.com/grafana/alloy/internal/service/http"
"github.com/grafana/alloy/internal/service/labelstore"
"github.com/grafana/alloy/internal/service/livedebugging"
"github.com/grafana/alloy/internal/util"
"github.com/grafana/alloy/syntax"
)
Expand Down Expand Up @@ -133,6 +134,8 @@ func TestForwardingToAppendable(t *testing.T) {
return cluster.Mock(), nil
case labelstore.ServiceName:
return labelstore.New(nil, prometheus_client.DefaultRegisterer), nil
case livedebugging.ServiceName:
return livedebugging.NewLiveDebugging(), nil
default:
return nil, fmt.Errorf("service %q does not exist", name)
}
Expand Down Expand Up @@ -239,6 +242,8 @@ func TestCustomDialer(t *testing.T) {
return cluster.Mock(), nil
case labelstore.ServiceName:
return labelstore.New(nil, prometheus_client.DefaultRegisterer), nil
case livedebugging.ServiceName:
return livedebugging.NewLiveDebugging(), nil

default:
return nil, fmt.Errorf("service %q does not exist", name)
Expand Down
Loading