-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38584][metrics] Support checkpoint path as Prometheus info-style metric #27165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…etheus info-style metric
This commit enhances the Prometheus reporter to export lastCheckpointExternalPath
as an info-style metric, making it compatible with Prometheus and VictoriaMetrics.
Changes:
- Add CheckpointPathInfoCollector to handle checkpoint path metrics
- Export checkpoint path as label value with metric value 1.0
- Add _info suffix to follow Prometheus naming convention
- Add comprehensive unit tests (4 test cases)
The checkpoint path is now exported as:
flink_jobmanager_job_lastCheckpointExternalPath_info{...,path=\"hdfs://...\"} 1.0
This allows users to:
1. Store checkpoint paths in Prometheus/VictoriaMetrics
2. Join checkpoint path with other metrics via PromQL
3. Create dashboards and alerts based on checkpoint p
| if (scopedMetricName.endsWith(CHECKPOINT_PATH_METRIC_NAME)) { | ||
| // Add _info suffix to follow Prometheus naming convention for info metrics | ||
| String infoMetricName = scopedMetricName + "_info"; | ||
| @SuppressWarnings("unchecked") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason to suppress unchecked warnings here. would we not want to log out these warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! In my opinion, the @SuppressWarnings("unchecked") is necessary here because:
-
Type Safety at Runtime: The Flink metrics system uses
Gauge<?>as the generic type, but we need to cast it toGauge<String>to access the checkpoint path value. This is a safe cast in this specific context because we've already verified that this metric islastCheckpointExternalPath, which is always a string-valued gauge. -
Why Not Log: We don't want to log this as a warning because:
- This is an expected and intentional cast, not an error condition
- The metric name check (
scopedMetricName.endsWith(CHECKPOINT_PATH_METRIC_NAME)) ensures we only perform this cast for the correct metric type - If the cast fails (which shouldn't happen in practice), it will throw a
ClassCastExceptionthat will be caught and logged by the caller
-
Alternative Considered: We could use
instanceofcheck, but it would be redundant since the metric name already uniquely identifies this as a string gauge.
If you prefer, I can add a comment explaining this, or we could add an instanceof check for extra safety:
if (scopedMetricName.endsWith(CHECKPOINT_PATH_METRIC_NAME)) {
if (metric instanceof Gauge) {
@SuppressWarnings("unchecked")
Gauge<String> pathGauge = (Gauge<String>) metric;
// ... rest of the code
} else {
log.warn("Expected Gauge for checkpoint path metric, but got: {}",
metric.getClass().getName());
return null;
}
}…etheus info-style metric
This commit enhances the Prometheus reporter to export lastCheckpointExternalPath
as an info-style metric, making it compatible with Prometheus and VictoriaMetrics.
Changes:
- Add CheckpointPathInfoCollector to handle checkpoint path metrics
- Export checkpoint path as label value with metric value 1.0
- Add _info suffix to follow Prometheus naming convention
- Add comprehensive unit tests (4 test cases)
The checkpoint path is now exported as:
flink_jobmanager_job_lastCheckpointExternalPath_info{...,path=\"hdfs://...\"} 1.0
This allows users to:
1. Store checkpoint paths in Prometheus/VictoriaMetrics
2. Join checkpoint path with other metrics via PromQL
3. Create dashboards and alerts based on checkpoint p
What is the purpose of the change
This pull request enhances the Prometheus reporter to export the
lastCheckpointExternalPathmetric as an info-style metric, making it compatible with Prometheus and VictoriaMetrics.Current Problem:
lastCheckpointExternalPathmetric is currently exported as a string-valued GaugeSolution:
lastCheckpointExternalPathas a Prometheus info-style metric with_infosuffixpathlabel instead of as a metric valueThis approach follows Prometheus best practices (similar to
node_uname_infofrom node_exporter) and enables users to:Brief change log
CHECKPOINT_PATH_METRIC_NAMEconstant to identify the checkpoint path metric_infosuffix to the metric namepathlabelVerifying this change
This change added tests and can be verified as follows:
Unit Tests:
Integration Verification:
All existing Prometheus reporter tests pass (27/27 tests):
PrometheusReporterTaskScopeTest: 5 testsPrometheusPushGatewayReporterTest: 4 testsManual Verification:
The change can be manually verified by:
curl http://localhost:9249/metrics)Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation
Documentation Details:
Additional Documentation (if requested):
If the community requires, I can add documentation to
docs/content/docs/deployment/metric_reporters.mdexplaining: