Skip to content

Conversation

@acrylJonny
Copy link
Collaborator

No description provided.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 31, 2025
if not tags_to_add:
continue

assert self.ctx.graph
Copy link

@aikido-pr-checks aikido-pr-checks bot Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.

Remediation: Raise an exception instead of using assert.
View details in Aikido Security

@codecov
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jul 31, 2025

class AddDatasetTagsConfig(TransformerSemanticsConfigModel):
get_tags_to_add: Callable[[str], List[TagAssociationClass]]
is_container: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've used this is_container flag pattern in other places, but frankly I find it really confusing that the dataset tag transformer has a container-related option.

I'd like us to spend a little bit of time thinking about if there is a better way to do this that results in a more logically consistent interface / less confusing experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, it's important to me that we continue to reuse as much code as possible around the logic of merging ingestion-produced tags with server-fetch tags. But I don't think stuffing all of that functionality in a dataset transformer is the right approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've used this is_container flag pattern in other places, but frankly I find it really confusing that the dataset tag transformer has a container-related option.

I'd like us to spend a little bit of time thinking about if there is a better way to do this that results in a more logically consistent interface / less confusing experience.

I do agree with the comment and sentiment. My general feel is that the OOTB transformers, in being limited to datasets, are restrictive. There is functionality within the existing transformers that would make sense to be applied to other entity types (e.g. dashboards, containers etc.). Because of this it feels like the is_container functionality has generally been added to get around these imposed restrictions on the OOTB transformers. I do feel that this change does at least offer consistency with other transformers, hence raising it.
It does feel like an area for us to revisit - what the future state of transformers should be, and how they should be used, and what entity types should be allowed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, it's important to me that we continue to reuse as much code as possible around the logic of merging ingestion-produced tags with server-fetch tags. But I don't think stuffing all of that functionality in a dataset transformer is the right approach.

What would you propose as next steps? Is there existing code logic that this should leverage or would you prefer for this PR to start on the creation of this common code?

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Aug 7, 2025
@acrylJonny acrylJonny changed the title feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging DRAFT - feat(ingestion/transformers): add is_container config to tags transformers to enable container tagging Oct 26, 2025
@acrylJonny acrylJonny marked this pull request as draft October 26, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-response Issue/request has been reviewed but requires a response from the submitter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants