An executor similar to KubernetesExecutor but one that runs tasks from one DAG on the same pod #31883
-
Hello awesome Airflow community, We have been using Airflow with KubernetesExecutor and quite happy with performance and isolation provided by running tasks in their own pods. One issue we often encounter is that a DAG with many parallel tasks ends up taking all the workers. We have tried to limit this behavior by the use of worker pools with varying degree of success. But it would be really nice to have an executor that runs all the tasks from a single DAG on the same pod. This would provide noisy neighbor isolation as well as ease of resource allocation. Thoughts? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
You can run Celery in a Pod and use CeleryKubernetesExecutor for that purpose. \You can have multiple queues and have a single Celery worker run as a pod (withs several parallell processes) - then have the DAG assigned to that Queue. If you have 1 worker with that Queue and all your tasks in your dags have the same queue (via default_args) - they will all run in that Celery worker. You will have the worker idle when there are no running tasks for that queue, but this is the price to pay. Usually when you want to achieve affinity (this is what you try to do) you have to pay some price -usually underutilisation of the machines you have. |
Beta Was this translation helpful? Give feedback.
-
If you would like this is to be done somewhat dynamically - it is extremely complex things to manage (despite what you described it in one sentence "I want tasks from the same DAGs to run together") - generally you will alwys have to pay under-utilization price if you want to do that, there is no way around it and defining a generic solution for having multiple tasks sometimes depending on each other and sometimes not, running always at the same machine will lead to likely even more underutilisation than having 1 queue to handle tasks for 1 Dag as described above. This is likely your best bet. |
Beta Was this translation helpful? Give feedback.
You can run Celery in a Pod and use CeleryKubernetesExecutor for that purpose.
\You can have multiple queues and have a single Celery worker run as a pod (withs several parallell processes) - then have the DAG assigned to that Queue. If you have 1 worker with that Queue and all your tasks in your dags have the same queue (via default_args) - they will all run in that Celery worker.
You will have the worker idle when there are no running tasks for that queue, but this is the price to pay. Usually when you want to achieve affinity (this is what you try to do) you have to pay some price -usually underutilisation of the machines you have.