datahub-project · acrylJonny · Jul 31, 2025 · Nov 1, 2025 · hsheth2 · Aug 7, 2025
diff --git a/metadata-ingestion/docs/transformer/dataset_transformer.md b/metadata-ingestion/docs/transformer/dataset_transformer.md
@@ -348,15 +348,18 @@ a tag called `USA-ops-team` and `Canada-marketing` will be added to them respect
 
 ### Config Details
 
-| Field              | Required | Type         | Default     | Description                                                        |
-| ------------------ | -------- | ------------ | ----------- | ------------------------------------------------------------------ |
-| `tag_urns`         | ✅       | list[string] |             | List of globalTags urn.                                            |
-| `replace_existing` |          | boolean      | `false`     | Whether to remove globalTags from entity sent by ingestion source. |
-| `semantics`        |          | enum         | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS.   |
+| Field              | Required | Type         | Default     | Description                                                                                                             |
+| ------------------ | -------- | ------------ | ----------- | ----------------------------------------------------------------------------------------------------------------------- |
+| `tag_urns`         | ✅       | list[string] |             | List of globalTags urn.                                                                                                 |
+| `replace_existing` |          | boolean      | `false`     | Whether to remove globalTags from entity sent by ingestion source.                                                      |
+| `semantics`        |          | enum         | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS.                                                        |
+| `is_container`     |          | bool         | `false`     | Whether to also consider a container or not. If true, then tags will be attached to both the dataset and its container. |
 
-Let’s suppose we’d like to add a set of dataset tags. To do so, we can use the `simple_add_dataset_tags` transformer that’s included in the ingestion framework.
+Let's suppose we'd like to add a set of dataset tags. To do so, we can use the `simple_add_dataset_tags` transformer that's included in the ingestion framework.
 
-The config, which we’d append to our ingestion recipe YAML, would look like this:
+If the is_container field is set to true, the module will not only attach the tags to the matching datasets but will also find and attach containers associated with those datasets. This means that both the datasets and their containers will be associated with the specified tags.
+
+The config, which we'd append to our ingestion recipe YAML, would look like this:
 
 ```yaml
 transformers:
@@ -399,20 +402,34 @@ transformers:
           - "urn:li:tag:NeedsDocumentation"
           - "urn:li:tag:Legacy"
   ```
+- Add tags to dataset and its containers
+  ```yaml
+  transformers:
+    - type: "simple_add_dataset_tags"
+      config:
+        is_container: true
+        semantics: PATCH / OVERWRITE # Based on user
+        tag_urns:
+          - "urn:li:tag:NeedsDocumentation"
+          - "urn:li:tag:Legacy"
+  ```
 
 ## Pattern Add Dataset globalTags
 
 ### Config Details
 
-| Field              | Required | Type                 | Default     | Description                                                                           |
-| ------------------ | -------- | -------------------- | ----------- | ------------------------------------------------------------------------------------- |
-| `tag_pattern`      | ✅       | map[regx, list[urn]] |             | Entity urn with regular expression and list of tags urn apply to matching entity urn. |
-| `replace_existing` |          | boolean              | `false`     | Whether to remove globalTags from entity sent by ingestion source.                    |
-| `semantics`        |          | enum                 | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS.                      |
+| Field              | Required | Type                 | Default     | Description                                                                                                             |
+| ------------------ | -------- | -------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------- |
+| `tag_pattern`      | ✅       | map[regx, list[urn]] |             | Entity urn with regular expression and list of tags urn apply to matching entity urn.                                   |
+| `replace_existing` |          | boolean              | `false`     | Whether to remove globalTags from entity sent by ingestion source.                                                      |
+| `semantics`        |          | enum                 | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS.                                                        |
+| `is_container`     |          | bool                 | `false`     | Whether to also consider a container or not. If true, then tags will be attached to both the dataset and its container. |
 
-Let’s suppose we’d like to append a series of tags to specific datasets. To do so, we can use the `pattern_add_dataset_tags` module that’s included in the ingestion framework. This will match the regex pattern to `urn` of the dataset and assign the respective tags urns given in the array.
+Let's suppose we'd like to append a series of tags to specific datasets. To do so, we can use the `pattern_add_dataset_tags` module that's included in the ingestion framework. This will match the regex pattern to `urn` of the dataset and assign the respective tags urns given in the array.
 
-The config, which we’d append to our ingestion recipe YAML, would look like this:
+If the is_container field is set to true, the module will not only attach the tags to the matching datasets but will also find and attach containers associated with those datasets. This means that both the datasets and their containers will be associated with the specified tags.
+
+The config, which we'd append to our ingestion recipe YAML, would look like this:
 
 ```yaml
 transformers:
@@ -462,19 +479,34 @@ transformers:
               ["urn:li:tag:NeedsDocumentation", "urn:li:tag:Legacy"]
             ".*example2.*": ["urn:li:tag:NeedsDocumentation"]
   ```
+- Add tags to dataset and its containers
+  ```yaml
+  transformers:
+    - type: "pattern_add_dataset_tags"
+      config:
+        is_container: true
+        semantics: PATCH
+        tag_pattern:
+          rules:
+            ".*example1.*": ["urn:li:tag:Private"]
+            ".*example2.*": ["urn:li:tag:Public"]
+  ```
 
 ## Add Dataset globalTags
 
 ### Config Details
 
-| Field              | Required | Type                                       | Default     | Description                                                                |
-| ------------------ | -------- | ------------------------------------------ | ----------- | -------------------------------------------------------------------------- |
-| `get_tags_to_add`  | ✅       | callable[[str], list[TagAssociationClass]] |             | A function which takes entity urn as input and return TagAssociationClass. |
-| `replace_existing` |          | boolean                                    | `false`     | Whether to remove globalTags from entity sent by ingestion source.         |
-| `semantics`        |          | enum                                       | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS.           |
+| Field              | Required | Type                                       | Default     | Description                                                                                                             |
+| ------------------ | -------- | ------------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------------------------------- |
+| `get_tags_to_add`  | ✅       | callable[[str], list[TagAssociationClass]] |             | A function which takes entity urn as input and return TagAssociationClass.                                              |
+| `replace_existing` |          | boolean                                    | `false`     | Whether to remove globalTags from entity sent by ingestion source.                                                      |
+| `semantics`        |          | enum                                       | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS.                                                        |
+| `is_container`     |          | bool                                       | `false`     | Whether to also consider a container or not. If true, then tags will be attached to both the dataset and its container. |
 
 If you'd like to add more complex logic for assigning tags, you can use the more generic add_dataset_tags transformer, which calls a user-provided function to determine the tags for each dataset.
 
+If the is_container field is set to true, the module will not only attach the tags to the matching datasets but will also find and attach containers associated with those datasets. This means that both the datasets and their containers will be associated with the specified tags.
+
 ```yaml
 transformers:
   - type: "add_dataset_tags"
@@ -536,6 +568,15 @@ Finally, you can install and use your custom transformer as [shown here](#instal
         semantics: PATCH
         get_tags_to_add: "<your_module>.<your_function>"
   ```
+- Add tags to dataset and its containers
+  ```yaml
+  transformers:
+    - type: "add_dataset_tags"
+      config:
+        is_container: true
+        semantics: PATCH / OVERWRITE # Based on user
+        get_tags_to_add: "<your_module>.<your_function>"
+  ```
 
 ## Set Dataset browsePath
 

diff --git a/metadata-ingestion/src/datahub/ingestion/transformer/add_dataset_tags.py b/metadata-ingestion/src/datahub/ingestion/transformer/add_dataset_tags.py
@@ -11,6 +11,7 @@
 from datahub.ingestion.api.common import PipelineContext
 from datahub.ingestion.transformer.dataset_transformer import DatasetTagsTransformer
 from datahub.metadata.schema_classes import (
+    BrowsePathsV2Class,
     GlobalTagsClass,
     MetadataChangeProposalClass,
     TagAssociationClass,
@@ -22,6 +23,7 @@
 
 class AddDatasetTagsConfig(TransformerSemanticsConfigModel):
     get_tags_to_add: Callable[[str], List[TagAssociationClass]]
+    is_container: bool = False
 
     _resolve_tag_fn = pydantic_resolve_key("get_tags_to_add")
 
@@ -73,6 +75,7 @@ def handle_end_of_stream(
 
         logger.debug("Generating tags")
 
+        # Generate tag entities
         for tag_association in self.processed_tags.values():
             tag_urn = TagUrn.from_string(tag_association.tag)
             mcps.append(
@@ -82,11 +85,58 @@ def handle_end_of_stream(
                 )
             )
 
+        # Handle container tags if is_container is enabled
+        container_tag_mcps: List[MetadataChangeProposalWrapper] = []
+        container_tag_mapping: Dict[str, List[TagAssociationClass]] = {}
+
+        logger.debug("Generating tags for containers")
+
+        if self.config.is_container:
+            for entity_urn, tags_to_add in (
+                (urn, self.config.get_tags_to_add(urn)) for urn in self.entity_map
+            ):
+                if not tags_to_add:
+                    continue
+
+                assert self.ctx.graph
+                browse_paths = self.ctx.graph.get_aspect(entity_urn, BrowsePathsV2Class)
+                if not browse_paths:
+                    continue
+
+                for path in browse_paths.path:
+                    container_urn = path.urn
+
+                    if not container_urn or not container_urn.startswith(
+                        "urn:li:container:"
+                    ):
+                        continue
+
+                    if container_urn not in container_tag_mapping:
+                        container_tag_mapping[container_urn] = tags_to_add.copy()
+                    else:
+                        # Merge tags, avoiding duplicates
+                        existing_tag_urns = {
+                            tag.tag for tag in container_tag_mapping[container_urn]
+                        }
+                        for tag in tags_to_add:
+                            if tag.tag not in existing_tag_urns:
+                                container_tag_mapping[container_urn].append(tag)
+
+            for urn, tags in container_tag_mapping.items():
+                container_tag_mcps.append(
+                    MetadataChangeProposalWrapper(
+                        entityUrn=urn,
+                        aspect=GlobalTagsClass(tags=tags),
+                    )
+                )
+
+        mcps.extend(container_tag_mcps)
         return mcps
 
 
 class SimpleDatasetTagConfig(TransformerSemanticsConfigModel):
     tag_urns: List[str]
+    is_container: bool = False
 
 
 class SimpleAddDatasetTags(AddDatasetTags):
@@ -99,6 +149,7 @@ def __init__(self, config: SimpleDatasetTagConfig, ctx: PipelineContext):
             get_tags_to_add=lambda _: tags,
             replace_existing=config.replace_existing,
             semantics=config.semantics,
+            is_container=config.is_container,
         )
         super().__init__(generic_config, ctx)
 
@@ -110,6 +161,7 @@ def create(cls, config_dict: dict, ctx: PipelineContext) -> "SimpleAddDatasetTag
 
 class PatternDatasetTagsConfig(TransformerSemanticsConfigModel):
     tag_pattern: KeyValuePattern = KeyValuePattern.all()
+    is_container: bool = False
 
 
 class PatternAddDatasetTags(AddDatasetTags):
@@ -123,6 +175,7 @@ def __init__(self, config: PatternDatasetTagsConfig, ctx: PipelineContext):
             ],
             replace_existing=config.replace_existing,
             semantics=config.semantics,
+            is_container=config.is_container,
         )
         super().__init__(generic_config, ctx)