Skip to content

A Star Documentation

hbao448 edited this page Aug 29, 2017 · 22 revisions

Our A* implementation is based off the following pseudocode

OPEN : priority queue of unexamined/not fully examined states
CLOSED : set of already examined states

Initialise the solution space
OPEN <= initial state (empty schedule)
CLOSED <= Ø

While Loop
    Pop best state s from OPEN (with lowest f(s))
    Test if s is goal state
        If yes, finished
    Expand s => create new states by considering free tasks (schedulable tasks)
        Calculate f(s) for states
        Insert in OPEN
    Insert s in CLOSED

Return s

Pruning is done during the popping from OPEN and also the expansion of states. This is done to skip unpromising states, which will result in the A* implementation reaching the optimal solution while examining less states.

  1. Initially, the OPEN set only contains an empty schedule and will be popped from the OPEN set.
  2. The empty schedule is then expanded by adding any free task in every possible way to every processor.
  3. The resulting schedules from the expansion is added to the OPEN set.
  4. The empty schedule is then added to the CLOSED set.
  5. The rest is as the pseudocode.

Parallelised version here.

Cost function f(s)

The cost function is used in choosing the best state from OPEN. It is an underestimate of the minimum final schedule length of a partial schedule. The closer the underestimate is to the actual length, the less states the A* implementation will need to examine to reach the optimal solution.

In order to have an cost function that's closer to the minimum final schedule length, the cost function is calculated as the maximum of 3 different cost functions:

1. Maximum of start time and bottom level of any task in schedule

COST : list to contain start time + bottom level of scheduled tasks

COST <= Ø

Loop through all the Processors
    Calculate the start time + bottom level of Processor scheduled tasks
    Insert in COST

Test if COST is empty
    If yes, return 0 (no scheduled tasks)
    If no, return MAX(COST)

2. Idle time plus computation load

IdleTime : Total Processor idle time (initially 0)
TotalWeight : Total weight of tasks (initially 0)

Loop through all the Processors
    Add Processor idle time to IdleTime

Loop through all the Task vertices
    Add Task weight to total weight

Calculate (IdleTime + TotalWeight)/number of processors
    Return result

3. Maximum of earliest start time and bottom level of any free tasks

COST : list to contain earliest start time + bottom level of scheduled tasks

COST <= Ø

Loop through all the Free Tasks
    Calculate the earliest start time + bottom level of free tasks
    Insert in COST

Test if COST is empty
    If yes, return 0 (no free tasks)
    If no, return MAX(COST)

Pruning Techniques

Pruning skips unpromising subtrees in the state space search. It allows the A* implementation to reduce the number of states that needs to be expanded, efficiently reducing memory usage and time spent in finding the optimal solution.

The pruning techniques used by our task scheduler are:

1. Detecting Duplicates

Test if s contained in CLOSED
    If yes, don't insert into OPEN
    If no, insert into OPEN

Partial scehdules that are duplicates will only be considered and expanded once. If a schedule's duplicate has already been considered (hence in CLOSED), it will be skipped.

Duplicates are schedules that have the same task sets scheduled at the same times on any of the processors.

2. Upper Bounding with List Scheduling

UpperBound : List Scheduling computed upper bound
Test if f(s) is higher than UpperBound
    If yes, don't insert into OPEN
    If no, insert into OPEN

List Scheduling finds a possible finish time for a set of tasks. The time returned is valid and can be used to set an Upper Bound for the A* algorithm.

Tasks are topologically sorted and assigned at their earliest possible start times to an initially empty schedule. Topological sort ensures that the tasks are added in an order that will ensure that the tasks assigned will have their dependencies ran.

3. Partial Expansion

Expand s => create new states by considering free tasks (schedulable tasks)
    Test if f(new state) is the same as f(s)
        If yes, stop expanding, insert new state into OPEN
        Test if s fully expanded
            If yes, insert s into CLOSED
            If no, insert s into OPEN

When expanding a state s, once the f(new state) value of any of the new states is equal to f(s), the rest of the free nodes are not scheduled for the time being.

This is because the fact that f(s) is the lowest cost in the state space at the moment. So, any new state s with the same f(s) will have the best cost. This enables the A* implementation to go deep in the state space search very quickly, possibly reaching the optimal solution much faster.

4. Schedule Horizon Equivalence Check

A partial solution representing a schedule is defined by the starting times of the tasks and their processor allocations. The order of these tasks are irrelevant for the final schedule length if different orders of tasks form a specific schedule horizon. If we have two or more different partial schedules for the same subset of tasks that have the same schedule horizon, the A* algorithm should only traverse through one schedule in one particular schedule horizon as the different schedules are variants of the same schedule horizon.

An iterative process of task swapping is employed by trying to swap the last task on a processor and its adjacent task on the same processor, without violating the predefined order of tasks (such as sorting the task names in alphabetical order). As soon as a swap is done that does not worsen the schedule horizon, the state is discarded as it is an equivalent schedule.

Input: Partial schedule, including last scheduled task m at end of processor P
   maxTime = schedule ending time for processor P
   Name tasks on P in start time order
   while there are vertices to be bubbled up
      Swap position of m and the new node if the node index is higher
      Schedule m and the new node as early as possible
         If finish time has not increased AND all outgoing communications are ok
            Return true (schedule is equivalent)
Return false (schedule is not equivalent)

The equivalent schedule pruning above illustrates how task swapping is used as a pruning technique to identify states that can be discarded. A swap is attempted between the last scheduled task m at the end of processor P and its forgoing task. If the finish time of processor P has not increased and if all outgoing communications of the new node do not delay any descendant tasks, then the schedule horizon has not worsened and the state can be discarded.

Outgoing Communications subroutine
Input: Solution to be checked for schedule horizon degradation 
For all processes on the processor P
If start time of task is later than start time of original task
   For all child nodes of the process
      time = schedule ending time for task + edge weight from task to child
      If the child start time > time AND child on a different processor than P
         Return false (schedule is not equivalent)
      else
         For all processors except for the current processor P
            atLeastOneLater = false
            For all parent nodes except for the current task
               If data arrival time from parent node >= time
                  atLeastOneLater = true
            If atLeastOneLater = false
               Return false (schedule is not equivalent)
Return true (schedule has not degraded and is equivalent)

If the start time of any tasks after m is later than before, then we must check whether any children of the swapped task are delayed. If any child task's start time is greater than the schedule horizon end time and the child is on a different processor, then the schedule is not equivalent as the schedule horizon has degraded. If the aforementioned conditions are not met, the task's later start still does not worsen the schedule horizon if any parent of the child task arrives later or at the same time as the task's communication time. This check must be verified on all processors. If at least one parent arrives later or at the same time compared to the schedule ending time for a task, then the schedule is equivalent and can be discarded.

5. Fixed Task Order

For all nodes nf in solution s where nf ∈ free(s) : if it satisfies (1), (2), (3), then fix task order

(1). nf has at most one parent and at most one child 
|parents(nf)| ≤ 1 && |children(nf)| ≤ 1

(2). if nf has a child, child(nf) == child(n ∈ free(s))

(3). if nf has a parent np, then all other parents of free tasks are also scheduled on the same processor as np.
procNo(np) == procNo(n ∈ parents(free(s))

If these conditions are satisified, then sort the free tasks based on non-decreasing order of data ready time, where data ready time must account for the weight of the edge between parent and child. If there is no parent of nf, then the dataReadyTime is set to 0.

dataReadyTime(nf) = tf(np) + c(edge pf)

Once you have the free tasks sorted, break any ties of data ready time by sorting based on non-increasing out-edge costs. If the task is a leaf, i.e no children, then its out-edge cost is zero.

At this point, the free tasks are sorted in a valid form for fork graph, now we must verify that the form is also valid for join graphs, in order confirm that the given fixed order will be valid for fork-join.

function validForkJoin(Fixed task order list)

for each element in list
    if next element out-edge cost > current element out-edge cost
        return false
return true

If it satisfies the previous condition, then the list is valid for fork-joins

for each element in list
    add dependency between element->next => element (i.e make element->next a child of element)
    remove element->next from free(s)

Doing this means that when children a created for a partial solution, there is only one element in free(s) and so branching factor is massively reduced. But if it is not a valid list for fork-joins, then free(s) remains the same as initially, i.e multiple tasks.

Parallelised A Star

Our Parallelised A* implementation is based off the following pseudocode

OPEN : priority queue of unexamined/not fully examined states
CLOSED : set of already examined states
THREADS : threads that will be running A*'s state space search
n : number of cores to run task scheduler on

Initialise the solution space
OPEN <= initial state (empty schedule)
CLOSED <= Ø
THREADS <= n number of threads

For each thread, pass the same OPEN and CLOSED and run
    While Loop
        Pop best state s from OPEN (with lowest f(s))
        Test if s is goal state
            If yes, finished
            Insert s in OPEN (key here is explained below) 
        Expand s => create new states by considering free tasks (schedulable tasks)
            Calculate f(s) for states
            Insert in OPEN
        Insert s in CLOSED

For each thread, compare finish time of goal state
    Return s with earliest finishing time

The goal of the paralelisation is to fill the OPEN and CLOSED set as fast as possible.

  • OPEN being filled faster allows the state with the better cost to be expanded earlier.
  • CLOSED being filled faster helps will help other A* threads in pruning with duplicate detection.

Overall, the goal is to perform A* faster.

The key mentioned in the duplicate is meant to help other A* threads stop running if the schedules they are examining will never be better than the best schedule reached by the first finishing thread. If a thread is examining states with better cost function than that best schedule that was inserted back into OPEN, that schedule will not be popped and thus the possible optimal schedule will not be compromised.

Clone this wiki locally