Skip to content

OSDOCS-13835: Docs for Kueue gang scheduling / all-or-nothing #95242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: kueue-docs
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions _topic_maps/_topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ Topics:
File: using-cohorts
- Name: Configuring fair sharing
File: configuring-fairsharing
- Name: Gang scheduling
File: gangscheduling
---
Name: Support
Dir: support
Expand Down
22 changes: 22 additions & 0 deletions configure/gangscheduling.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
:_mod-docs-content-type: ASSEMBLY
include::_attributes/common-attributes.adoc[]
[id="gangscheduling"]
= Gang scheduling
:context: gangscheduling

toc::[]

Gang scheduling ensures that a group or _gang_ of related jobs only start when all required resources are available. {product-title} enables gang scheduling by suspending jobs until the {platform} cluster can guarantee the capacity to start and execute all of the related jobs in the gang together. This is also known as _all-or-nothing_ scheduling.

Gang scheduling is important if you are working with expensive, limited resources, such as GPUs, and can prevent jobs from claiming but not using GPUs, which can improve GPU utilization and can reduce running costs. Gang scheduling can also help to prevent issues like resource segmentation and deadlocking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I suggest breaking up the first sentence:

Suggested change
Gang scheduling is important if you are working with expensive, limited resources, such as GPUs, and can prevent jobs from claiming but not using GPUs, which can improve GPU utilization and can reduce running costs. Gang scheduling can also help to prevent issues like resource segmentation and deadlocking.
Gang scheduling is important if you are working with expensive, limited resources, such as GPUs. Gang scheduling can prevent jobs from claiming but not using GPUs, which can improve GPU utilization and can reduce running costs. Gang scheduling can also help to prevent issues like resource segmentation and deadlocking.


include::modules/configuring-gangscheduling.adoc[leveloffset=+1]

////
// use case - deep learning
One classic example is in deep learning workloads. Deep learning frameworks (Tensorflow, PyTorch etc) require all the workers to be running during the training process.

In this scenario, when you deploy training workloads, all the components should be scheduled and deployed to ensure the training works as expected.

Gang Scheduling is a critical feature for Deep Learning workloads to enable all-or-nothing scheduling capability, as most DL frameworks requires all workers to be running to start training process. Gang Scheduling avoids resource inefficiency and scheduling deadlock sometimes.
////
43 changes: 43 additions & 0 deletions modules/configuring-gangscheduling.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
// Module included in the following assemblies:
//
// * configure/gangscheduling.adoc

:_mod-docs-content-type: REFERENCE
[id="configuring-gangscheduling_{context}"]
= Configuring gang scheduling

You can configure gang scheduling by modifying the `gangScheduling` spec in the `Kueue` custom resource (CR).

.Example `Kueue` CR with gang scheduling configured
[source,yaml]
----
apiVersion: kueue.openshift.io/v1
kind: Kueue
metadata:
name: cluster
labels:
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: kueue-operator
namespace: openshift-kueue-operator
spec:
config:
gangScheduling:
policy: ByWorkload # <1>
byWorkload:
admission: Parallel # <2>
# ...
----
<1> You can set the `policy` value to enable or disable gang scheduling. The possible values are `ByWorkload`, `None`, or empty (`""`).
+
ByWorkload:: When the `policy` value is set to `ByWorkload`, each job is processed and considered for admission as a single unit. If the job does not become ready within the specified time, the entire job is evicted and retried at a later time.
+
None:: When the `policy` value is set to `None`, gang scheduling is disabled.
Comment on lines +32 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ByWorkload:: When the `policy` value is set to `ByWorkload`, each job is processed and considered for admission as a single unit. If the job does not become ready within the specified time, the entire job is evicted and retried at a later time.
+
None:: When the `policy` value is set to `None`, gang scheduling is disabled.
`ByWorkload`:: When the `policy` value is set to `ByWorkload`, each job is processed and considered for admission as a single unit. If the job does not become ready within the specified time, the entire job is evicted and retried at a later time.
+
`None`:: When the `policy` value is set to `None`, gang scheduling is disabled.

+
Empty:: When the `policy` value is empty or set to `""`, the {product-title} Operator determines settings for gang scheduling. Currently, gang scheduling is disabled by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Empty:: When the `policy` value is empty or set to `""`, the {product-title} Operator determines settings for gang scheduling. Currently, gang scheduling is disabled by default.
Empty (`""`):: When the `policy` value is empty or set to `""`, the {product-title} Operator determines settings for gang scheduling. Currently, gang scheduling is disabled by default.

<2> If the `policy` value is set to `ByWorkload`, you must configure job admission settings. The possible values for the `admission` spec are `Parallel`, `Sequential`, or empty (`""`).
+
Parallel:: When the `admission` value is set to `Parallel`, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity, and pods from another job being successfully scheduled can prevent pods from the current job from being scheduled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Parallel:: When the `admission` value is set to `Parallel`, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity, and pods from another job being successfully scheduled can prevent pods from the current job from being scheduled.
`Parallel`:: When the `admission` value is set to `Parallel`, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity, and pods from another job being successfully scheduled can prevent pods from the current job from being scheduled.

The last sentence confuses me. Could we break it up into smaller sentences? IDK if this specific suggestion changes the meaning too much. Please only take what is useful:

Suggested change
Parallel:: When the `admission` value is set to `Parallel`, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity, and pods from another job being successfully scheduled can prevent pods from the current job from being scheduled.
Parallel:: When the `admission` value is set to `Parallel`, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity. When deadlock occurs, the successful scheduling of pods from another job can prevent the scheduling of pods from the current job.

+
Sequential:: When the `admission` value is set to `Sequential`, only pods from the currently processing job are admitted. After all of the pods from the current job have been admitted and are ready, {product-title} processes the next job. Sequential processing can slow down admission when the cluster has sufficient capacity for multiple jobs, but provides a higher likelihood that all of the pods for a job are scheduled together successfully.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sequential:: When the `admission` value is set to `Sequential`, only pods from the currently processing job are admitted. After all of the pods from the current job have been admitted and are ready, {product-title} processes the next job. Sequential processing can slow down admission when the cluster has sufficient capacity for multiple jobs, but provides a higher likelihood that all of the pods for a job are scheduled together successfully.
`Sequential`:: When the `admission` value is set to `Sequential`, only pods from the currently processing job are admitted. After all of the pods from the current job have been admitted and are ready, {product-title} processes the next job. Sequential processing can slow down admission when the cluster has sufficient capacity for multiple jobs, but provides a higher likelihood that all of the pods for a job are scheduled together successfully.

+
Empty:: When the `admission` value is empty or set to `""`, the {product-title} Operator determines job admission settings. Currently, the `admission` value is set to `Parallel` by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Empty:: When the `admission` value is empty or set to `""`, the {product-title} Operator determines job admission settings. Currently, the `admission` value is set to `Parallel` by default.
Empty (`""`):: When the `admission` value is empty or set to `""`, the {product-title} Operator determines job admission settings. Currently, the `admission` value is set to `Parallel` by default.