Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Support for Historical Data Points #986

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

luismy
Copy link
Contributor

@luismy luismy commented Jun 11, 2023

Initial implementation to support historical data points by having a new flag addHistoricalMetrics at the job config level

Related to #985

Pending tasks

  • Build docker image and test to see if the Prometheus graphs are as accurate as the ones provided by AWS CloudWatch
  • Go over the code to change the implementation and use a hashmap to store the "template" metric
  • Make the new flag configurable at the metric level to keep it consistent with the other flags!? Check how easy/difficult is going to be
  • Add tests
  • Linting

Initial implementation to support historical data points by having a new flag `addHistoricalMetrics` at the job config level
Comment on lines +236 to +238
// Include the timestamp to avoid genuine duplicates!? At this point we have all the metrics to be exposed under the `/metrics` endpoint so
// we aren't able to tell if some of the metrics are present because the `addHistoricalMetrics` is set to `true`!?
metricKey := fmt.Sprintf("%s-%d-%d", *metric.Name, prom_model.LabelsToSignature(metric.Labels), metric.Timestamp.Unix())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to include the timestamp to remove "real" duplicates?

Comment on lines 106 to 133
previousIdx := -1
previousID := ""
for _, metricDataResult := range data {
idx := findGetMetricDataByID(getMetricDatas, *metricDataResult.ID)
// TODO: This logic needs to be guarded by a feature flag! Also, remember to add compatibility in the client v2
if idx == -1 {
logger.Warn("GetMetricData returned unknown metric ID", "metric_id", *metricDataResult.ID)
if addHistoricalMetrics {
// Use the previousIdx to make a copy
if previousIdx != -1 && previousID == *metricDataResult.ID {
// Create a new CloudwatchData object
newData := *getMetricDatas[previousIdx]
newData.GetMetricDataPoint = metricDataResult.Datapoint
newData.GetMetricDataTimestamps = metricDataResult.Timestamp

getMetricDatas = append(getMetricDatas, &newData)
} else {
logger.Warn("GetMetricData returned unknown metric ID", "metric_id", *metricDataResult.ID)
}
} else {
logger.Warn("GetMetricData returned unknown metric ID", "metric_id", *metricDataResult.ID)
}
continue
}
getMetricDatas[idx].GetMetricDataPoint = metricDataResult.Datapoint
getMetricDatas[idx].GetMetricDataTimestamps = metricDataResult.Timestamp
getMetricDatas[idx].MetricID = nil // mark as processed
previousIdx = idx
previousID = *metricDataResult.ID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first attempt but I'm not keen on using the "order" in which the data is retrieved to achieve this. I'm planning to go over this section again and use a hashmap/dictionary to make this logic independent from the order the getMetricDatas has (I believe timestamp descending if I recall correctly)

@@ -33,6 +33,7 @@ type JobLevelMetricFields struct {
Delay int64 `yaml:"delay"`
NilToZero *bool `yaml:"nilToZero"`
AddCloudwatchTimestamp *bool `yaml:"addCloudwatchTimestamp"`
AddHistoricalMetrics *bool `yaml:"addHistoricalMetrics"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new flag to allow this behaviour is only supported at the job level. I'll do some testing first but it might be a good idea to support it even at the metric level.

@luismy luismy marked this pull request as ready for review June 11, 2023 21:01
@luismy
Copy link
Contributor Author

luismy commented Jun 11, 2023

I'm planning to make more changes but marking this PR as ready for review to get early feedback

@thomaspeitz
Copy link
Contributor

Love the idea of it :)

@gavin-jeong
Copy link
Contributor

I'm using this from our environment. Works nice

@ecktom
Copy link

ecktom commented Nov 3, 2023

This would also drastically reduce costs for people who are currently looking to get metrics in a 1 minute resolution (scrape-interval=60s / period=60s / length=60s), right? @gavin-jeong you've been using this for a while without issues?

@gavin-jeong
Copy link
Contributor

@ecktom no any issue yet. seems it works well.

@luismy
Copy link
Contributor Author

luismy commented Mar 3, 2024

Apologies for the delays on this one. I'm planning to spend some time next weekend refreshing this with the latest from master.

I hope the rebase is not too painful 😢

…oints_issue_985

# Conflicts:
#	pkg/clients/cloudwatch/client.go
#	pkg/clients/cloudwatch/v1/client.go
#	pkg/clients/cloudwatch/v2/client.go
#	pkg/clients/v2/cache_test.go
#	pkg/job/discovery.go
#	pkg/model/model.go
@luismy
Copy link
Contributor Author

luismy commented Apr 14, 2024

I completed the rebase locally a while back but I can see that there are more changes lol. Let's see if can get to a "safe" place this week 😭

@ecktom
Copy link

ecktom commented Apr 15, 2024

@luismy would be highly appreciated if you can give it another shot. We also tried to get it rebased but were stuck in some segfaults which we couldn't get right

@kgeckhart
Copy link
Contributor

kgeckhart commented Apr 21, 2024

👋 as the main person who's been driving a lot of the conflicts here I'm sorry @luismy. I honestly didn't realize this existed for so long but I'm "done" enough with the getmetricdata refactoring work I set out to do to enable #1094 that I think further conflicts should be rather minor. I will leave a comment with a suggestion that I think could cleanup the implementation I think with the latest refactors there's a chance to have addHistoricalDatapoints be handled completely by https://github.com/nerdswords/yet-another-cloudwatch-exporter/blob/master/pkg/promutil/migrate.go#L79 simplifying the change and making it much more testable.

The latest refactors split a lot of CloudwatchData in to more independent structs specifically these three, https://github.com/nerdswords/yet-another-cloudwatch-exporter/blob/master/pkg/model/model.go#L165-L172.

  1. The addHistoricalDatapoints config could be added to MetricMigrationParams.
  2. Allow ScrapeAwsData to be unaware of addHistoricalDatapoints

Let me know what you think and if you could use a hand with conflict resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants