Time taken for recommendations generation using Bulk API is not consistent #1530

chandrams · 2025-03-10T09:26:04Z

Describe the bug
Recommendations generation using Bulk API with thanos datasource took longer for one of the configs and lesser in another when the total experiments are same for both

Job completed in 5 hrs 30 mins approx

"summary": {
        "status": "COMPLETED",
        "total_experiments": 500,
        "processed_experiments": 500,
        "notifications": {},
        "input": {
            "filter": {
                "exclude": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {}
                },
                "include": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {
                        "org_id": "org-1",
                        "cluster_id": "eu-1-1"
                    }
                }
            },
            "time_range": {
                "start": "2025-03-09T20:00:00.000Z",
                "end": "2025-03-10T02:00:00.000Z"
            },
            "datasource": "thanos",
            "webhook": null
        },
        "job_id": "7ecbe061-5c10-4c92-9a83-f93314db54e1",
        "job_start_time": "2025-03-06T11:09:40.529Z",
        "job_end_time": "2025-03-06T16:46:03.328Z"
    },

Job completed in 30 mins approx

"summary": {
        "status": "COMPLETED",
        "total_experiments": 500,
        "processed_experiments": 500,
        "notifications": {},
        "input": {
            "filter": {
                "exclude": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {}
                },
                "include": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {
                        "org_id": "org-1",
                        "cluster_id": "eu-1-2"
                    }
                }
            },
            "time_range": {
                "start": "2025-03-09T20:00:00.000Z",
                "end": "2025-03-10T02:00:00.000Z"
            },
            "datasource": "thanos",
            "webhook": null
        },
        "job_id": "93549e20-7fab-4899-ba38-b26b4a70285c",
        "job_start_time": "2025-03-06T16:46:27.631Z",
        "job_end_time": "2025-03-06T17:13:35.805Z"
    },

How to reproduce it

Clone this PR and run the scalability test after replacing the server (where thanos setup) is present in the below command:

./bulk_scale_test.sh -i quay.io/vinakuma/autotune_operator:jobsave3  -w 1 -d 1 -r /tmp/results/bulk_scale_test_jobsave3 -a 1 -o 1 -c 2 -s "2025-03-10T02:00:00.000Z" --url="http://thanos-query-frontend-thanos-bench.apps.<server>/"

Expected behavior
Time taken to generate recommendations for the same no. of experiments should be around the same

Relevant logs
Will attach the kruize pod log if required

Environment:

Kubernetes Cluster openshift

The text was updated successfully, but these errors were encountered:

chandrams added the bug Something isn't working label Mar 10, 2025

chandrams added this to Monitoring Mar 10, 2025

chandrams added this to the Kruize 0.5 Release milestone Mar 10, 2025

chandrams assigned msvinaykumar Mar 10, 2025

dinogun moved this to In Progress in Monitoring Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time taken for recommendations generation using Bulk API is not consistent #1530

Time taken for recommendations generation using Bulk API is not consistent #1530

chandrams commented Mar 10, 2025

Time taken for recommendations generation using Bulk API is not consistent #1530

Time taken for recommendations generation using Bulk API is not consistent #1530

Comments

chandrams commented Mar 10, 2025