Implement support for task tags #33536
Replies: 4 comments 1 reply
-
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
Unfortunately - there is no way to estimate. The estimation of size of a task is really a "coastline paradox" https://medium.com/workmatters/estimating-work-the-coastline-paradox-orders-of-approximation-and-agile-scoping-f850fb1e1419 - the more closely you look at it, often the longer the coastline is. I think it's a matter of "need/faith" - the more you believe a feature is needed and the more faith you have you should do it, the more you should actually do it. Believing that maintainers/other collaborators can help in estimation is wrong. There are many sources that you can search ("expert estimates wrong") that experts in a given area are no better in estimating how much it can take to implement something. This is an often repeated misconceptions that if you've done something several times, you can estimate how much another task related to that can take - reliably. This is wrong. Experts KNOW why things work the way they work, but often fail in estimating how complex it can be to implement something new, precisely because they are tied to the ways they know. And they might be wrong when you want to do a new thing in estimating the complexity. They might come up with some points that you should consider, but it's you and only you can attempt to estimate how much time it can take you to do stuff (and other studies show that the more experienced you are, you are underestimating the time you need to complete it as well). Another point here is that experts have no idea how "good" you are in implementing stuff. That was impossible in the past, but recent AI improvements in some code generation made it even more difficult and unpredictable. In simple sub-algorithms, you cold get 10x speedup by using generated code, in some more complex cases you could get 10x slowdown by starting to debug buggy AI-generated code only to find an obscure edge case not handled by AI (that you'd never do as a expert). So in short - delegating to maintainers the "estimation" on how much it will take you to iylement something is doubly-wrong:
But I heartily encourage you to take it on anyway - especially if you feel good about doing it. I found that it's often much better to do stuff that you are particularly passionate about, than the one you quantify (with estimates, impact etc.) . We are community of individual contributors and "community over code" is important - the more you personally feel things are important, the mpre they actually - are - important. |
Beta Was this translation helpful? Give feedback.
-
@potiuk Thanks for your thorough explanation on estimation. At this time I do not have the bandwidth to commit to any open source work unfortunately. This being said, I implemented (long time ago) a userland version of this feature, which you will be able to find the implementation of here: https://gist.github.com/philippefutureboy/27ba48a835c713b45001f2db04b7f527 Hopefully those that find this issue/discussion find this solution satisfying :) Cheers! |
Beta Was this translation helpful? Give feedback.
-
Hey there! Chiming in for another use case: Search for previous occurrences of a task instance (operator independent) In my specific use case, I want to run dbt tests only once on release. In order to filter previous task instances that have run, the optimal method would be to tag the previous task instances with the release tag; something like a post-execute hook for tagging. For now the alternative can be implemented by returning via XCOM a "tags" key that includes the release tag. This requires however that I have control over the operator (which is the case in my use case, but may not be in the case of other use-cases). (Once again I don't have the bandwidth to do a native implementation. I'll update this answer with a userland version of this feature) |
Beta Was this translation helpful? Give feedback.
-
Description
Task tags could be a useful addition when one wants to filter upstream or downstream tasks in dynamically-composed DAGs.
This issue was opened as a follow-up to discussion #17697.
Use case / motivation
If at any point one wants to filter tasks for whichever purpose, the current standard is to have a naming convention for your tasks. Unfortunately this can be limiting if you need to filter your tasks using more than one tag.
In my use-case I want to collect upstream tasks that produced files and collect these files (JSON) for assembly downstream into a larger JSON file.
Are you willing to submit a PR?
If that's a task that is doable within less than a day of work (6-8h), testing included, I'd be happy to do so on a weekend! 😃
Otherwise no, I don't have the bandwidth unfortunately.
If there is a collaborator that could have an idea of the scope and the places to target (mainly BaseOperator, right?), that would greatly help me commit to a PR :)
Beta Was this translation helpful? Give feedback.
All reactions