-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(block-scheduler): one job per partition (local branch copy) #15579
base: main
Are you sure you want to change the base?
Conversation
💻 Deploy preview available: https://deploy-preview-loki-15579-zb444pucvq-vp.a.run.app/docs/loki/latest/ |
|
||
// find first job for this partition | ||
nextJob := sort.Search(len(jobs), func(i int) bool { | ||
return jobs[i].Job.Partition() >= job.Partition() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return jobs[i].Job.Partition() >= job.Partition() | |
return jobs[i].Job.Partition() == job.Partition() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function probably works a bit different than you think :)
}) | ||
|
||
if nextJob < len(jobs) { | ||
_, _, _ = s.idempotentEnqueue(jobs[nextJob]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a potential race here that could result in this operation to fail as we try to enqueue the same job from two different routines (from HandleCompleteJob
or from the scheduler run loop). but this might not cause any problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ok since the enqueue code first checks the statusmap for existence of a job before adding it to the queue. We may need to introduce some error handling here or there to swallow these cases, but it shouldn't be possible to add duplicate jobs at the moment.
Local branch copy of #15578 to debug possible fork permission issues
max_jobs_planned_per_interval
. We don't want this yet and it was being used incorrectly as the plannersmaxJobsPerPartition
argument.Some of it is a bit hacky & will need refactoring, but this should work well enough in the meantime to flesh out issues.