Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time taken for recommendations generation using Bulk API is not consistent #1530

Open
chandrams opened this issue Mar 10, 2025 · 0 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@chandrams
Copy link
Contributor

Describe the bug
Recommendations generation using Bulk API with thanos datasource took longer for one of the configs and lesser in another when the total experiments are same for both

Job completed in 5 hrs 30 mins approx

"summary": {
        "status": "COMPLETED",
        "total_experiments": 500,
        "processed_experiments": 500,
        "notifications": {},
        "input": {
            "filter": {
                "exclude": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {}
                },
                "include": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {
                        "org_id": "org-1",
                        "cluster_id": "eu-1-1"
                    }
                }
            },
            "time_range": {
                "start": "2025-03-09T20:00:00.000Z",
                "end": "2025-03-10T02:00:00.000Z"
            },
            "datasource": "thanos",
            "webhook": null
        },
        "job_id": "7ecbe061-5c10-4c92-9a83-f93314db54e1",
        "job_start_time": "2025-03-06T11:09:40.529Z",
        "job_end_time": "2025-03-06T16:46:03.328Z"
    },

Job completed in 30 mins approx

"summary": {
        "status": "COMPLETED",
        "total_experiments": 500,
        "processed_experiments": 500,
        "notifications": {},
        "input": {
            "filter": {
                "exclude": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {}
                },
                "include": {
                    "namespace": [],
                    "workload": [],
                    "containers": [],
                    "labels": {
                        "org_id": "org-1",
                        "cluster_id": "eu-1-2"
                    }
                }
            },
            "time_range": {
                "start": "2025-03-09T20:00:00.000Z",
                "end": "2025-03-10T02:00:00.000Z"
            },
            "datasource": "thanos",
            "webhook": null
        },
        "job_id": "93549e20-7fab-4899-ba38-b26b4a70285c",
        "job_start_time": "2025-03-06T16:46:27.631Z",
        "job_end_time": "2025-03-06T17:13:35.805Z"
    },

How to reproduce it

  • Clone this PR and run the scalability test after replacing the server (where thanos setup) is present in the below command:
./bulk_scale_test.sh -i quay.io/vinakuma/autotune_operator:jobsave3  -w 1 -d 1 -r /tmp/results/bulk_scale_test_jobsave3 -a 1 -o 1 -c 2 -s "2025-03-10T02:00:00.000Z" --url="http://thanos-query-frontend-thanos-bench.apps.<server>/"

Expected behavior
Time taken to generate recommendations for the same no. of experiments should be around the same

Relevant logs
Will attach the kruize pod log if required

Environment:

  • Kubernetes Cluster openshift
@chandrams chandrams added the bug Something isn't working label Mar 10, 2025
@chandrams chandrams added this to the Kruize 0.5 Release milestone Mar 10, 2025
@dinogun dinogun moved this to In Progress in Monitoring Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants