Implement support for task tags #33536

philippefutureboy · 2021-08-18T19:32:29Z

philippefutureboy
Aug 18, 2021

Description

Task tags could be a useful addition when one wants to filter upstream or downstream tasks in dynamically-composed DAGs.
This issue was opened as a follow-up to discussion #17697.

Use case / motivation

If at any point one wants to filter tasks for whichever purpose, the current standard is to have a naming convention for your tasks. Unfortunately this can be limiting if you need to filter your tasks using more than one tag.
In my use-case I want to collect upstream tasks that produced files and collect these files (JSON) for assembly downstream into a larger JSON file.

Are you willing to submit a PR?

If that's a task that is doable within less than a day of work (6-8h), testing included, I'd be happy to do so on a weekend! 😃
Otherwise no, I don't have the bandwidth unfortunately.
If there is a collaborator that could have an idea of the scope and the places to target (mainly BaseOperator, right?), that would greatly help me commit to a PR :)

2021-08-18T19:32:30Z

boring-cyborg[bot]
bot Aug 18, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

0 replies

potiuk · 2023-08-19T20:49:10Z

potiuk
Aug 19, 2023
Collaborator

If there is a collaborator that could have an idea of the scope and the places to target (mainly BaseOperator, right?), that would greatly help me commit to a PR :)

Unfortunately - there is no way to estimate. The estimation of size of a task is really a "coastline paradox" https://medium.com/workmatters/estimating-work-the-coastline-paradox-orders-of-approximation-and-agile-scoping-f850fb1e1419 - the more closely you look at it, often the longer the coastline is.

I think it's a matter of "need/faith" - the more you believe a feature is needed and the more faith you have you should do it, the more you should actually do it. Believing that maintainers/other collaborators can help in estimation is wrong. There are many sources that you can search ("expert estimates wrong") that experts in a given area are no better in estimating how much it can take to implement something. This is an often repeated misconceptions that if you've done something several times, you can estimate how much another task related to that can take - reliably. This is wrong.

Experts KNOW why things work the way they work, but often fail in estimating how complex it can be to implement something new, precisely because they are tied to the ways they know. And they might be wrong when you want to do a new thing in estimating the complexity.

They might come up with some points that you should consider, but it's you and only you can attempt to estimate how much time it can take you to do stuff (and other studies show that the more experienced you are, you are underestimating the time you need to complete it as well).

Another point here is that experts have no idea how "good" you are in implementing stuff. That was impossible in the past, but recent AI improvements in some code generation made it even more difficult and unpredictable. In simple sub-algorithms, you cold get 10x speedup by using generated code, in some more complex cases you could get 10x slowdown by starting to debug buggy AI-generated code only to find an obscure edge case not handled by AI (that you'd never do as a expert).

So in short - delegating to maintainers the "estimation" on how much it will take you to iylement something is doubly-wrong:

experts are often wrong in estimations - particularly because the more close you look at it, the more it changes ("coastline paradox")
particularly the experts have absolutely no idea how much time it can take YOU to do the task

But I heartily encourage you to take it on anyway - especially if you feel good about doing it. I found that it's often much better to do stuff that you are particularly passionate about, than the one you quantify (with estimates, impact etc.) . We are community of individual contributors and "community over code" is important - the more you personally feel things are important, the mpre they actually - are - important.

0 replies

philippefutureboy · 2023-08-26T17:49:11Z

philippefutureboy
Aug 26, 2023
Author

@potiuk Thanks for your thorough explanation on estimation.

At this time I do not have the bandwidth to commit to any open source work unfortunately.

This being said, I implemented (long time ago) a userland version of this feature, which you will be able to find the implementation of here: https://gist.github.com/philippefutureboy/27ba48a835c713b45001f2db04b7f527

Hopefully those that find this issue/discussion find this solution satisfying :)

Cheers!

0 replies

philippefutureboy · 2023-09-20T16:48:45Z

philippefutureboy
Sep 20, 2023
Author

Hey there!

Chiming in for another use case:

Search for previous occurrences of a task instance (operator independent)

In my specific use case, I want to run dbt tests only once on release. In order to filter previous task instances that have run, the optimal method would be to tag the previous task instances with the release tag; something like a post-execute hook for tagging. For now the alternative can be implemented by returning via XCOM a "tags" key that includes the release tag. This requires however that I have control over the operator (which is the case in my use case, but may not be in the case of other use-cases).

(Once again I don't have the bandwidth to do a native implementation. I'll update this answer with a userland version of this feature)

1 reply

philippefutureboy Sep 27, 2023
Author

RE: In the end, we opted for an external metadata management solution - elementary - to manage our test runs metadata outside of Airflow. The solution seems more reliable in comparison to storing tags on the Airflow task instance as we have a dag that recurrently deletes old data in the Airflow database to avoid a build up of data and slow down over time.
Hence I won't be writing an implementation for this! I'll come back here if I do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for task tags #33536

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Implement support for task tags #33536

philippefutureboy Aug 18, 2021

Replies: 4 comments · 1 reply

boring-cyborg[bot] bot Aug 18, 2021

potiuk Aug 19, 2023 Collaborator

philippefutureboy Aug 26, 2023 Author

philippefutureboy Sep 20, 2023 Author

philippefutureboy Sep 27, 2023 Author

philippefutureboy
Aug 18, 2021

Replies: 4 comments 1 reply

boring-cyborg[bot]
bot Aug 18, 2021

potiuk
Aug 19, 2023
Collaborator

philippefutureboy
Aug 26, 2023
Author

philippefutureboy
Sep 20, 2023
Author

philippefutureboy Sep 27, 2023
Author