An executor similar to KubernetesExecutor but one that runs tasks from one DAG on the same pod #31883

llamageddon83 · 2023-06-13T23:40:27Z

llamageddon83
Jun 13, 2023

Hello awesome Airflow community,

We have been using Airflow with KubernetesExecutor and quite happy with performance and isolation provided by running tasks in their own pods. One issue we often encounter is that a DAG with many parallel tasks ends up taking all the workers. We have tried to limit this behavior by the use of worker pools with varying degree of success.

But it would be really nice to have an executor that runs all the tasks from a single DAG on the same pod. This would provide noisy neighbor isolation as well as ease of resource allocation.

Thoughts?

Answered by potiuk

Jun 14, 2023

You can run Celery in a Pod and use CeleryKubernetesExecutor for that purpose.

\You can have multiple queues and have a single Celery worker run as a pod (withs several parallell processes) - then have the DAG assigned to that Queue. If you have 1 worker with that Queue and all your tasks in your dags have the same queue (via default_args) - they will all run in that Celery worker.

You will have the worker idle when there are no running tasks for that queue, but this is the price to pay. Usually when you want to achieve affinity (this is what you try to do) you have to pay some price -usually underutilisation of the machines you have.

View full answer

potiuk · 2023-06-14T00:15:05Z

potiuk
Jun 14, 2023
Collaborator

You can run Celery in a Pod and use CeleryKubernetesExecutor for that purpose.

\You can have multiple queues and have a single Celery worker run as a pod (withs several parallell processes) - then have the DAG assigned to that Queue. If you have 1 worker with that Queue and all your tasks in your dags have the same queue (via default_args) - they will all run in that Celery worker.

You will have the worker idle when there are no running tasks for that queue, but this is the price to pay. Usually when you want to achieve affinity (this is what you try to do) you have to pay some price -usually underutilisation of the machines you have.

4 replies

llamageddon83 Jun 14, 2023
Author

Hmm interesting.. will definitely give it a shot. Thank you

llamageddon83 Jun 14, 2023
Author

Actually on a second thought, will celery worker require some kind of broker? If so, that will be a hard to push change (red tape) 😅

potiuk Jun 14, 2023
Collaborator

Yes it does. Well. You can be picky of course.

But if you recall that Airdlwo is a software you paid exactly 0 for, and that it is developed by volunteers (a lot of them users like you who would like to have some features), that puts it in perspective

This is what Airflow has now. You either accept want it has or (at the very best scenario) invest your time and effort to - like 2500 other contributors - to design, discuss, propose, and implement another solution.

Which one is more costly for you ? Which has a more chance to succeed ?(you would have to not only figure out complex algorithms but also get the community to agree and approve your idea and then also implement it and wait until it gets released.

I guess you should pick your poison - whether you want to go into direction. 'i implemet something new for Airflow, or 'i install broker in my deployment' (which BTW is supported out-of-the box by the Helm Chart Airflow community also releases.

I suggest you actually look at the options you have and consider what your options is before you make your decisions.

llamageddon83 Jun 15, 2023
Author

Yup totally.. thank you for your continued support 🙇

potiuk · 2023-06-14T00:19:34Z

potiuk
Jun 14, 2023
Collaborator

If you would like this is to be done somewhat dynamically - it is extremely complex things to manage (despite what you described it in one sentence "I want tasks from the same DAGs to run together") - generally you will alwys have to pay under-utilization price if you want to do that, there is no way around it and defining a generic solution for having multiple tasks sometimes depending on each other and sometimes not, running always at the same machine will lead to likely even more underutilisation than having 1 queue to handle tasks for 1 Dag as described above. This is likely your best bet.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An executor similar to KubernetesExecutor but one that runs tasks from one DAG on the same pod #31883

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

An executor similar to KubernetesExecutor but one that runs tasks from one DAG on the same pod #31883

llamageddon83 Jun 13, 2023

Replies: 2 comments · 4 replies

potiuk Jun 14, 2023 Collaborator

llamageddon83 Jun 14, 2023 Author

llamageddon83 Jun 14, 2023 Author

potiuk Jun 14, 2023 Collaborator

llamageddon83 Jun 15, 2023 Author

potiuk Jun 14, 2023 Collaborator

llamageddon83
Jun 13, 2023

Replies: 2 comments 4 replies

potiuk
Jun 14, 2023
Collaborator

llamageddon83 Jun 14, 2023
Author

llamageddon83 Jun 14, 2023
Author

potiuk Jun 14, 2023
Collaborator

llamageddon83 Jun 15, 2023
Author

potiuk
Jun 14, 2023
Collaborator