Skip to content
Nikolaos Triantafyllis edited this page Oct 3, 2022 · 3 revisions

Description

A Python implementation of an adjustable resource manager for MPI applications on a computer cluster. It consists of:

  • A scheduling module supporting two algorithms:
    1. FCFS: Favours old jobs.
    2. WFP3: Favours short/old jobs while taking into account their respective size.
  • An optional backfilling module.
  • A resource allocation module implementing three policies:
    1. Compact: Use all the cores of a node on one app.
    2. Spare: Use half the cores of a node on one app (unfavored).
    3. Strip (co-scheduling): Split the cores of a node between two apps. Can be improved with the inclusion of a heatmap, used to identify apps that match well together.
  • The main module that glues all of the above.

After being submitted as a batch script, the code assigns MPI jobs across a number of bound nodes. These jobs are part of a queue consisting of applications from the Nas Parallel Benchmarks (NPB) suite.

Clone this wiki locally