Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to achieve streaming of Keda scaled jobs #5881

Closed
vinayak-shanawad opened this issue Jun 12, 2024 · 6 comments
Closed

Not able to achieve streaming of Keda scaled jobs #5881

vinayak-shanawad opened this issue Jun 12, 2024 · 6 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@vinayak-shanawad
Copy link

vinayak-shanawad commented Jun 12, 2024

Report

We are running the Generative AI workloads (GPU resources) using Keda-scaled jobs. We are not able to achieve streaming of Keda-scaled jobs.

Expected Behavior

Scenarios:

Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1

(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job. So we would expect Keda to process subsequent jobs even if existing jobs are in progress.

FYI: We did achieve streaming of batch jobs on AWS SageMaker where we can create N number of jobs in parallel even if existing jobs are in progress.

Actual Behavior

Scenarios:

  1. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1
    (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
    1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job.

We tried addressing the above concern with the below settings.

  1. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5
    (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
    1 message in the queue → Keda triggers 5 jobs and processes 1 job but no use of other jobs/pods. It would be expensive in terms of cost because we are launching 4 GPU pods/jobs unnecessarily. After all, there is only one message in a queue.
    2 messages in the queue → Keda triggers 5 jobs and processes 2 jobs but no use of other pods.

Steps to Reproduce the Problem

  1. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1
    (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
    1 message in the queue → Keda triggers 1 job/pod and processes it.

  2. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5
    (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
    1 message in the queue → Keda triggers 5 jobs and processes 1 job.
    2 messages in the queue → Keda triggers 5 pods and processes 2 pods.

Logs from KEDA operator

No response

KEDA Version

2.14.0

Kubernetes Version

1.29

Platform

Amazon Web Services

Scaler Details

AWS SQS Queue

Anything else?

No response

@vinayak-shanawad vinayak-shanawad added the bug Something isn't working label Jun 12, 2024
@vinayak-shanawad
Copy link
Author

Any updates on this issue?

@zroubalik
Copy link
Member

Makes sense, are you willing to contribute a fix?

@junekhan
Copy link
Contributor

This feature can probably resolve the issue.

@vinayak-shanawad
Copy link
Author

@junekhan Should we set any specific parameter in scaledjob spec to resolve this issue?

Copy link

stale bot commented Sep 11, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Sep 11, 2024
Copy link

stale bot commented Sep 20, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
None yet
Development

No branches or pull requests

3 participants