Skip to content

Commit

Permalink
Merge pull request #1750 from buildtesters/lsf_queue_check_in_executors
Browse files Browse the repository at this point in the history
Refactor code for LSF queue validation for executor check
  • Loading branch information
shahzebsiddiqui authored Apr 12, 2024
2 parents 1603ec4 + fc66804 commit 43c064f
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 39 deletions.
35 changes: 3 additions & 32 deletions buildtest/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,45 +263,16 @@ def _validate_lsf_executors(self):
if not lsf.active():
return

queue_list = []
valid_queue_state = "Open:Active"

record = lsf.queues()["RECORDS"]
# retrieve all queues from json record
for name in record:
queue_list.append(name["QUEUE_NAME"])

# check all executors have defined valid queues and check queue state.
for executor in lsf_executors:
executor_name = f"{self.name()}.{executor_type}.{executor}"
if self.is_executor_disabled(lsf_executors[executor]):
self.disabled_executors.append(executor_name)
continue

queue = lsf_executors[executor].get("queue")
# if queue field is defined check if its valid queue
if queue:
if queue not in queue_list:
self.invalid_executors.append(executor_name)
logger.error(
f"'{queue}' is invalid LSF queue. Please select one of the following queues: {queue_list}"
)
continue

# check queue record for Status
for name in record:
# skip record until we find matching queue
if name["QUEUE_NAME"] != queue:
continue

queue_state = name["STATUS"]
# if state not Open:Active we raise error
if not queue_state == valid_queue_state:
self.invalid_executors.append(executor_name)
logger.error(
f"'{queue}' is in state: {queue_state}. It must be in {valid_queue_state} state in order to accept jobs"
)
break
if not lsf.validate_queue(executor=lsf_executors[executor]):
self.invalid_executors.append(executor_name)
continue

self.valid_executors[executor_type][executor_name] = {
"setting": lsf_executors[executor]
Expand Down
36 changes: 36 additions & 0 deletions buildtest/scheduler/detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,42 @@ def get_queues(self):

return queues

def validate_queue(self, executor):
"""This method will validate a LSF queue. We check if queue is available and in 'Open:Active' state.
The input is a dictionary containing the LSF executor configuration. If queue is not
found we return False.
Args:
executor (dict): The dictionary containing the LSF executor configuration.
Returns:
bool: True if queue is found and in 'Open:Active' state, False otherwise.
"""

queue_name = executor["queue"]
queue_active_state = "Open:Active"

queue_list = [name["QUEUE_NAME"] for name in self._queues["RECORDS"]]
if queue_name not in queue_list:
return False

for record in self._queues["RECORDS"]:
# check queue record for Status

# skip record until we find matching queue
if record["QUEUE_NAME"] != queue_name:
continue

queue_state = record["STATUS"]
# if state not Open:Active we raise error
if not queue_state == queue_active_state:
self.logger.error(
f"'{queue_name}' is in state: {queue_state}. It must be in {queue_active_state} state in order to accept jobs"
)
return False

return True


class Cobalt(Scheduler):
"""The Cobalt class checks for Cobalt binaries and gets a list of Cobalt queues"""
Expand Down
2 changes: 1 addition & 1 deletion docs/batch_support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1146,7 +1146,7 @@ Let's run this example and notice that this job ran to completion but it was rep
Torque
-------

Buildtest has support for running jobs on `Torque <https://adaptivecomputing.com/products/torque-resource-manager/>`_ scheduler. You must
Buildtest has support for running jobs on `Torque <https://adaptivecomputing.com/cherry-services/torque-resource-manager/>`_ scheduler. You must
define a :ref:`torque_executors` in your configuration file to use Torque scheduler. The ``#PBS`` directives can be specified using
``pbs`` property which is a list of PBS options that get inserted at top of script. Shown below is an example sleep job that will run on
a single node for 5 seconds.
Expand Down
16 changes: 10 additions & 6 deletions docs/configuring_buildtest/site_examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,24 @@ configuring compilers that is available on Perlmutter.
Oak Ridge National Laboratory
-----------------------------

`Summit <https://docs.olcf.ornl.gov/systems/summit_user_guide.html>`_ is a training
system for Summit at OLCF, which is using a IBM Load Sharing
Facility (LSF) as their batch scheduler. Ascent has two
queues **batch** and **test**. To declare LSF executors we define them under ``lsf``
`Summit <https://docs.olcf.ornl.gov/systems/summit_user_guide.html>`_ is a IBM based system
hosted at Oak Ridge Leadership Computing Facility (OLCF). The system uses IBM Load Sharing
Facility (LSF) as their batch scheduler.

The ``system`` keyword is used to define the name of system which in this example is named ``summit``. The
``hostnames`` is used to specify a list of hostnames where buildtest can run in order to use this system configuration.

The system comes with several queues, for the purposes of this example we define 3 executors
that map to queues **batch** , **test** and **storage**. To declare LSF executors we define them under ``lsf``
section within the ``executors`` section.

The default batch configuration is defined in ``defaults``, for instance we set the fields ``pollinterval``, ``maxpendtime``
and to **30s** and **300s** each. The field ``account`` is used to specify project account where all jobs will be charged. This can be
customized to each site but and can be changed in the configuration file or overridden via command line ``buildtest build --account <ACCOUNT>``.


.. literalinclude:: ../../tests/settings/summit.yml
:language: yaml
:emphasize-lines: 19-23,37-39
:emphasize-lines: 2-5,19-23,37-43

Argonne National Laboratory
---------------------------
Expand Down
4 changes: 4 additions & 0 deletions tests/settings/summit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ system:
lsf:
batch:
queue: batch
storage:
queue: storage
debug:
queue: debug
compilers:
find:
gcc: ^(gcc)
Expand Down

0 comments on commit 43c064f

Please sign in to comment.