DRAFT - feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging #14290

acrylJonny · 2025-07-31T18:44:22Z

No description provided.

…rmers to enable container tagging

aikido-pr-checks · 2025-07-31T18:44:39Z

metadata-ingestion/src/datahub/ingestion/transformer/add_dataset_tags.py

+                if not tags_to_add:
+                    continue
+
+                assert self.ctx.graph


Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.

Remediation: Raise an exception instead of using assert.
^{View details in Aikido Security}

codecov · 2025-07-31T18:46:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

hsheth2 · 2025-08-07T20:38:39Z

metadata-ingestion/src/datahub/ingestion/transformer/add_dataset_tags.py


 class AddDatasetTagsConfig(TransformerSemanticsConfigModel):
    get_tags_to_add: Callable[[str], List[TagAssociationClass]]
+    is_container: bool = False


I know we've used this is_container flag pattern in other places, but frankly I find it really confusing that the dataset tag transformer has a container-related option.

I'd like us to spend a little bit of time thinking about if there is a better way to do this that results in a more logically consistent interface / less confusing experience.

In particular, it's important to me that we continue to reuse as much code as possible around the logic of merging ingestion-produced tags with server-fetch tags. But I don't think stuffing all of that functionality in a dataset transformer is the right approach.

I know we've used this is_container flag pattern in other places, but frankly I find it really confusing that the dataset tag transformer has a container-related option.

I'd like us to spend a little bit of time thinking about if there is a better way to do this that results in a more logically consistent interface / less confusing experience.

I do agree with the comment and sentiment. My general feel is that the OOTB transformers, in being limited to datasets, are restrictive. There is functionality within the existing transformers that would make sense to be applied to other entity types (e.g. dashboards, containers etc.). Because of this it feels like the is_container functionality has generally been added to get around these imposed restrictions on the OOTB transformers. I do feel that this change does at least offer consistency with other transformers, hence raising it.
It does feel like an area for us to revisit - what the future state of transformers should be, and how they should be used, and what entity types should be allowed.

In particular, it's important to me that we continue to reuse as much code as possible around the logic of merging ingestion-produced tags with server-fetch tags. But I don't think stuffing all of that functionality in a dataset transformer is the right approach.

What would you propose as next steps? Is there existing code logic that this should leverage or would you prefer for this PR to start on the creation of this common code?

feat(ingestion/transformers): add is_container config to tags transfo…

b03fae1

…rmers to enable container tagging

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 31, 2025

aikido-pr-checks bot reviewed Jul 31, 2025

View reviewed changes

github-actions bot deployed to datahub-wheels (Preview) July 31, 2025 18:45 View deployment

datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jul 31, 2025

vercel bot deployed to Preview July 31, 2025 19:02 View deployment

hsheth2 reviewed Aug 7, 2025

View reviewed changes

datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Aug 7, 2025

acrylJonny changed the title ~~feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging~~ DRAFT - feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging Oct 26, 2025

acrylJonny marked this pull request as draft October 26, 2025 11:12

Merge branch 'master' into is_container_tags_transformers

ba518d6

github-actions bot deployed to datahub-wheels (Preview) November 1, 2025 00:24 View deployment

vercel bot deployed to Preview November 1, 2025 00:39 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DRAFT - feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging #14290

DRAFT - feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging #14290

Uh oh!

acrylJonny commented Jul 31, 2025

Uh oh!

aikido-pr-checks bot Jul 31, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 31, 2025 •

edited

Loading

Uh oh!

hsheth2 Aug 7, 2025

Uh oh!

hsheth2 Aug 7, 2025

Uh oh!

acrylJonny Aug 7, 2025

Uh oh!

acrylJonny Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DRAFT - feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging #14290

Are you sure you want to change the base?

DRAFT - feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging #14290

Uh oh!

Conversation

acrylJonny commented Jul 31, 2025

Uh oh!

aikido-pr-checks bot Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hsheth2 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

hsheth2 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

acrylJonny Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

acrylJonny Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aikido-pr-checks bot Jul 31, 2025 •

edited

Loading

codecov bot commented Jul 31, 2025 •

edited

Loading