Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasource Metadata queries do not use the specified bulk time range #1454

Open
chandrams opened this issue Jan 9, 2025 · 2 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@chandrams
Copy link
Contributor

Describe the bug
Datasource Metadata queries have measurement duration of 15 days and do not consider the specified bulk time range
Test failed with 503 http code and kruize pod was restarted with parallel requests.

How to reproduce it
Invoke the bulk API with a time range difference of 1 hour in start & end time with prometheus datasource, metadata queries fetch 15 days instead of using the bulk time range

curl -X POST http://kruize-openshift-tuning.apps.<server>/bulk -H 'Content-Type: application/json' -d '{"filter": {"exclude": {"namespace": [], "workload": [], "containers": [], "labels": {}}, "include": {"namespace": [], "workload": [], "containers": [], "labels": {}}}, "datasource": "prometheus-1", "time_range": {"start": "2025-01-08T12:30:00.000Z", "end": "2025-01-08T13:30:00.000Z"}}'

Expected behavior
Metadata queries should consider the bulk time range.

Relevant logs

2025-01-0904:02:20.375 INFO [pool-9-thread-1][DataSourceMetadataOperator.java(215)]-namespaceQuery: sum by (namespace) ( avg_over_time(kube_namespace_status_phase{namespace!='' }[15d]))
2025-01-0904:02:20.375 INFO [pool-9-thread-3][DataSourceMetadataOperator.java(301)]-filterBuilder: workload!=''
2025-01-0904:02:20.375 INFO [pool-9-thread-4][DataSourceMetadataOperator.java(215)]-namespaceQuery: sum by (namespace) ( avg_over_time(kube_namespace_status_phase{namespace!='' }[15d]))
2025-01-0904:02:20.375 INFO [pool-9-thread-3][DataSourceMetadataOperator.java(301)]-filterBuilder: container!=''
2025-01-0904:02:20.375 INFO [pool-9-thread-4][DataSourceMetadataOperator.java(216)]-workloadQuery: sum by (namespace, workload, workload_type) ( avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{workload!='' }[15d]))  
2025-01-0904:02:20.375 INFO [pool-9-thread-3][DataSourceMetadataOperator.java(215)]-namespaceQuery: sum by (namespace) ( avg_over_time(kube_namespace_status_phase{namespace!='' }[15d]))

Environment:

  • Kubernetes Cluster : openshift
@chandrams chandrams added the bug Something isn't working label Jan 9, 2025
@chandrams chandrams added this to the Kruize 0.4 Release milestone Jan 9, 2025
@dinogun dinogun moved this to Todo in Monitoring Jan 9, 2025
@shreyabiradar07 shreyabiradar07 moved this from Todo to In Progress in Monitoring Feb 6, 2025
@shreyabiradar07
Copy link
Contributor

shreyabiradar07 commented Mar 12, 2025

@chandrams I was able to reproduce this issue and observe that the metadata is being imported for the specified time range in the bulk input with a time range difference of 1 hour in start & end time with thanos datasource

List of steps followed:

  • Add thanos datasource details in kruize-crc-openshift.yaml, also increase the resources to 2Gi and 2 cores(to avoid pod restarts)
"datasource": [
        {
          "name": "thanos",
          "provider": "prometheus",
          "serviceName": "",
          "namespace": "",
          "url": "http://thanos-query-frontend-example-query-thanos-operator-system.apps.kruize-scalelab.h0b5.p1.openshiftapps.com",
          "authentication": {
              "type": "bearer",
              "credentials": {
                "tokenFilePath": "/var/run/secrets/kubernetes.io/serviceaccount/token"
              }
          }
        }
      ]
  • Deploy Kruize in kruize-scalelab AWS cluster
./deploy.sh -c openshift -m crc -i quay.io/shbirada/integrate_metadata_api:v1
oc expose svc/kruize -n openshift-tuning
  • Create MetricProfile
curl -X POST http://kruize-openshift-tuning.apps.kruize-scalelab.h0b5.p1.openshiftapps.com/createMetricProfile -d  @./manifests/autotune/performance-profiles/resource_optimization_local_monitoring.json
  • Create MetadataProfile
 curl -X POST http://kruize-openshift-tuning.apps.kruize-scalelab.h0b5.p1.openshiftapps.com/createMetadataProfile -d  @./manifests/autotune/metadata-profiles/bulk_cluster_metadata_local_monitoring.json
  • Invoke /bulk API - adding org_id,cluster_id, metadata_profile(mandatory) and meaurement_duration(optional) fields
curl -X POST http://kruize-openshift-tuning.apps.kruize-scalelab.h0b5.p1.openshiftapps.com/bulk -d
{
  "filter": {
    "exclude": {
      "namespace": [],
      "workload": [],
      "containers": [],
      "labels": {}
    },
    "include": {
      "namespace": [],
      "workload": [],
      "containers": [],
      "labels": {
        "org_id": "org-1",
        "cluster_id": "eu-1-1"
      }
    }
  },
  "datasource": "thanos",
  "metadata_profile": "cluster-metadata-local-monitoring",
  "measurement_duration": "15min",
  "time_range": {
    "start": "2025-01-08T12:30:00.000Z",
    "end": "2025-01-08T13:30:00.000Z"
  }
}

in the pods logs start time: 2025-01-08 12:30:00/1736339400 and end time: 2025-01-08 13:30:00/1736323200 stamps with step=900 is observed

2025-03-1212:40:21.121 INFO [qtp110053477-61][MetricProfileCollection.java(66)]-Trying to add the metric profile to collection: resource-optimization-local-monitoring
2025-03-1212:40:21.121 INFO [qtp110053477-61][MetricProfileCollection.java(71)]-MetricProfile added to the collection successfully: resource-optimization-local-monitoring
2025-03-1212:40:28.130 INFO [qtp110053477-58][MetadataProfileCollection.java(90)]-Trying to add the metadata profile to collection: cluster-metadata-local-monitoring
2025-03-1212:40:28.131 INFO [qtp110053477-58][MetadataProfileCollection.java(95)]-MetadataProfile added to the collection successfully: cluster-metadata-local-monitoring
2025-03-1212:40:32.702 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(324)]-filterBuilder: namespace!=''
2025-03-1212:40:32.702 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(324)]-filterBuilder: workload!=''
2025-03-1212:40:32.702 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(324)]-filterBuilder: container!=''
2025-03-1212:40:32.703 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(232)]-namespaceQuery: sum by (namespace) (avg_over_time(kube_namespace_status_phase{namespace!="" ,org_id="org-1",cluster_id="eu-1-1"}[15m]))
2025-03-1212:40:32.703 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(233)]-workloadQuery: sum by (namespace, workload, workload_type) (avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{workload!="" ,org_id="org-1",cluster_id="eu-1-1"}[15m]))
2025-03-1212:40:32.703 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(234)]-containerQuery: sum by (container, image, workload, workload_type, namespace) (avg_over_time(kube_pod_container_info{container!="" ,org_id="org-1",cluster_id="eu-1-1"}[15m]) * on (pod, namespace) group_left(workload, workload_type) avg_over_time(namespace_workload_pod:kube_pod_owner:relabel{workload!="" ,org_id="org-1",cluster_id="eu-1-1"}[15m]))
2025-03-1212:40:32.703 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(235)]-startTime: 1736339400
2025-03-1212:40:32.703 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(236)]-endTime: 1736343000
2025-03-1212:40:32.703 INFO [pool-10-thread-1][DataSourceMetadataOperator.java(237)]-steps: 900
2025-03-1212:40:45.869 INFO [qtp110053477-58][BulkService.java(93)]-Job ID: 7816462f-d7c1-4e71-ab96-59f2b2b594ff

@shreyabiradar07
Copy link
Contributor

http://kruize-openshift-tuning.apps.kruize-scalelab.h0b5.p1.openshiftapps.com/bulk?job_id=7eff23e0-0805-4945-8ac7-dd9e062b55ef&include=metadata
http://kruize-openshift-tuning.apps.kruize-scalelab.h0b5.p1.openshiftapps.com/bulk?job_id=7eff23e0-0805-4945-8ac7-dd9e062b55ef&include=summary,experiments 

However there are no recommendations generated due to
There is not enough data available to generate a recommendation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

3 participants