-
Notifications
You must be signed in to change notification settings - Fork 500
Batch acceleration
A typical app might have jobs that take average ~1 hour CPU time to complete. But the turnaround time on a particular host H may be higher because:
- H has a large work buffer, and other jobs must complete before this one starts
- H computes only sporadically
- H is slower than average
So turnaround time on H could be several days. So the 'max delay' setting for the app may need to be, say, 1 week.
If there's a large batch - say 1000 jobs - some of them will get sent to hosts that never complete them. After a week these jobs time out and we resend them to other hosts. But some of these hosts may never complete them, or complete them with large turnaround time.
As a result, the 'makespan' of the batch - the time from submission to 100% completion - may be several weeks.
We'd like to reduce batch makespan using scheduling techniques; we call this 'batch acceleration'.
Our goal is to reduce makespan with minimal complexity. We're not concerned with performance; current projects have a few thousand hosts, not millions.
The basic idea: mark certain hosts as 'low turnaround time' (LTT). Mark the last 10% or so of jobs in each batch as 'high priority'. Use LTT hosts to run high-priority jobs.
This involves several components:
-
Scheduler: enforce the above scheduling rule.
-
batch_stats.php (new): scan batches, compute median TT; For each host, compute average of 'normalized TT' (TT/median). A host is LTT if this is < 1. For each app, make a list of hosts that have returned jobs. If at least 100, and 25% are LTT, app is accelerable.
-
batch_accel.php (new): periodically scan in-progress batches, identifying those that need acceleration. Mark jobs as high-priority, and possibly create new instances.
host.error_rate: 1 if average of TT/median < 1, else 0
app.n_size_classes: nonzero means accelerable
batch.expire_time: median TT of success jobs
Runs every hour or so.
For each batch with at least 50% success jobs, compute median TT of success jobs.
For each job in such a batch, its 'TT ratio' is TT/median if success, ~10 if not.
Set host.error_rate to the average of TT ratios of its jobs (over all batches). A host is low-turnaround if this is < 1.
Note: the distribution of TT is generally a big clump below some level, with a bunch of outliers. We want to exclude the outliers.
For each app A, let N = # of hosts that completed a job for this app, M = # of these hosts marked as low_turnaround. Set A.accelerable if N > 100 and M > N*.25
Job selection:
if a job is high priority and app is accelerable
if host is LTT
boost job score
else
don't sent
Runs every hour or so.
For each in-progress batch B that's at least 90% complete:
if app is accelerable
compute average TT of completed jobs
For each uncompleted job
mark WU as high priority
mark its unsent results as high priority
if no unsent results,
and in-prog results are older than average TT,
and #results < max_total_results
increment wu.target_nresults, trigger transition
else
set all incomplete WUs and unsent results to zero priority
Ideally, we want both high and low prio jobs in shmem, so we'll have jobs for both LTT and non-LTT hosts.
I added a feeder option --batch_accel that adds a random factor to get a mix of high and low prio jobs.
In deciding whether a host is LTT, the above doesn't distinguish apps, app versions, or BUDA variants. It won't work as intended in situations where e.g. a host succeeds with CPU versions but fails with GPU versions.