Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Malte Sander <[email protected]>
  • Loading branch information
sbernauer and maltesander authored Oct 6, 2023
1 parent 6d31346 commit e8be4e5
Showing 1 changed file with 6 additions and 6 deletions.
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
= Allowed Pod disruptions

You can configure the allowed Pod disruptions for Trino nodes as described in xref:concepts:operations/pod_disruptions.adoc[].
You can configure the permitted Pod disruptions for Trino nodes as described in xref:concepts:operations/pod_disruptions.adoc[].

Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs:
Unless you configure something else or disable the provided PodDisruptionBudgets (PDBs), the following PDBs are written:

== Coordinators
We only allow a single coordinator to be offline at any given time, regardless of the number of replicas or `roleGroups`.
The provided PDBs only allow a single coordinator to be offline at any given time, regardless of the number of replicas or `roleGroups`.

== Workers
Normally users deploy multiple workers to speed up queries, handle multiple queries in parallel or to just have enough memory available in the Cluster to execute a big query.

Taking this into consideration, our operator uses the following algorithm to determine the maximum number of workers allowed to be unavailable at the same time:
Taking this into consideration, the operator uses the following algorithm to determine the maximum number of workers allowed to be unavailable at the same time:

`num_workers` is the number of workers in the Trino cluster, summed over all `roleGroups`.

Expand Down Expand Up @@ -47,12 +47,12 @@ This results e.g. in the following numbers:
|===

== Reduce rolling redeployment durations
The default PDBs we write out are pessimistic and will cause the rolling redeployment to take a considerable amount of time.
The default PDBs of the operator are pessimistic and will cause the rolling redeployment to take a considerable amount of time.
As an example, in a cluster with 100 workers, 10 workers are restarted at the same time. Assuming a worker takes 5 minutes to properly restart, the whole redeployment will take (100 nodes / 10 nodes simultaneous * 5 minutest = ) 50 minutes.

You can use the following measures to speed this up:

1. Increase `maxUnavailable` using the `spec.workers.roleConfig.podDisruptionBudget.maxUnavailable` field as described in xref:concepts:operations/pod_disruptions.adoc[].
2. Write your own PDBs as described in xref:concepts:operations/pod_disruptions.adoc#_using_you_own_custom_pdbs[Using you own custom PDBs].

WARNING: In cases you modify or disable the default PDBs, it's your responsibility to either make sure there are enough DataNodes available or accept the risk of blocks not being available!
WARNING: In case you modify or disable the default PDBs, it is your responsibility to make sure there are enough workers available to manage the existing workload and performance requirements!

0 comments on commit e8be4e5

Please sign in to comment.