diff --git a/config/_default/menus/main.en.yaml b/config/_default/menus/main.en.yaml index 51029807c85..2b8692e0e5f 100644 --- a/config/_default/menus/main.en.yaml +++ b/config/_default/menus/main.en.yaml @@ -5085,6 +5085,12 @@ menu: identifier: data_catalog parent: data_observability_heading weight: 65000 + - name: Lineage + url: data_observability/lineage + pre: data-observability-wui + identifier: data_lineage + parent: data_observability_heading + weight: 67500 - name: Quality Monitoring url: data_observability/quality_monitoring/ pre: data-observability-wui diff --git a/content/en/data_observability/_index.md b/content/en/data_observability/_index.md index 4623db7f9c0..a4210ae8bd0 100644 --- a/content/en/data_observability/_index.md +++ b/content/en/data_observability/_index.md @@ -5,6 +5,9 @@ further_reading: - link: '/data_observability/data_catalog/' tag: 'Documentation' text: 'Data Catalog' + - link: '/data_observability/lineage/' + tag: 'Documentation' + text: 'Lineage' - link: '/data_observability/quality_monitoring/' tag: 'Documentation' text: 'Quality Monitoring' @@ -32,6 +35,7 @@ Data Observability (DO) helps data teams improve the reliability of data for ana {{< whatsnext desc="Data Observability consists of the following:" >}} {{< nextlink href="/data_observability/data_catalog/" >}}Data Catalog: Browse and search a centralized inventory of your data assets across connected integrations.{{< /nextlink >}} + {{< nextlink href="/data_observability/lineage/" >}}Lineage: Trace upstream dependencies and downstream consumers across your data stack.{{< /nextlink >}} {{< nextlink href="/data_observability/quality_monitoring/" >}}Quality Monitoring: Identify data issues before downstream BI and AI applications are impacted.{{< /nextlink >}} {{< nextlink href="/data_observability/jobs_monitoring/" >}}Jobs Monitoring: Observe, troubleshoot, and optimize jobs across your data pipelines.{{< /nextlink >}} {{< /whatsnext >}} diff --git a/content/en/data_observability/data_catalog.md b/content/en/data_observability/data_catalog.md index 25fd29fa0f2..704195ce195 100644 --- a/content/en/data_observability/data_catalog.md +++ b/content/en/data_observability/data_catalog.md @@ -8,6 +8,9 @@ further_reading: - link: '/data_observability/quality_monitoring/' tag: 'Documentation' text: 'Quality Monitoring' + - link: '/data_observability/lineage/' + tag: 'Documentation' + text: 'Lineage' - link: '/data_observability/jobs_monitoring/' tag: 'Documentation' text: 'Jobs Monitoring' @@ -27,7 +30,7 @@ When you open the catalog at [/data-obs/catalog](https://app.datadoghq.com/data- - **Links to the source system**: direct references back to the origin platform so you can navigate from the catalog to the source in one click - **Tags**: `key:value` metadata pairs pulled from the source system if available - **Monitor Status**: displays the state of any active [Data Quality Monitors](/data_observability/quality_monitoring/) on the asset -- **Lineage**: upstream and downstream dependencies, where supported by the integration +- **Lineage**: upstream and downstream dependencies, where supported by the integration. To explore lineage across assets, see [Lineage][1]. Use the left sidebar to filter assets by type: {{< ui >}}All assets{{< /ui >}}, {{< ui >}}Databases{{< /ui >}}, {{< ui >}}Schemas{{< /ui >}}, or {{< ui >}}Tables{{< /ui >}}. Connected integrations (such as Snowflake, dbt, and BigQuery) are also listed individually in the sidebar. @@ -45,3 +48,5 @@ Wildcards and unions are also supported: - **Intersection**: `dim_zendesk AND data_owner:TS-OPS-ANALYTICS` Recent searches are saved and surfaced in the dropdown for quick reuse. + +[1]: /data_observability/lineage/ diff --git a/content/en/data_observability/jobs_monitoring/_index.md b/content/en/data_observability/jobs_monitoring/_index.md index f0cb98c384f..9f8fe024cb4 100644 --- a/content/en/data_observability/jobs_monitoring/_index.md +++ b/content/en/data_observability/jobs_monitoring/_index.md @@ -4,6 +4,9 @@ description: "Monitor performance, reliability, and cost efficiency of data proc aliases: - /data_jobs/ further_reading: + - link: '/data_observability/lineage/' + tag: 'Documentation' + text: 'Lineage' - link: '/data_streams' tag: 'Documentation' text: 'Data Streams Monitoring' @@ -19,6 +22,7 @@ Data Observability: Jobs Monitoring provides visibility into the performance, re - Track the health and performance of data processing jobs across your accounts and workspaces. See which take up the most compute resources or have inefficiencies. - Receive an alert when a job fails—or when a job is taking too long to complete. - Analyze job execution details and stack traces. +- Use [Lineage][2] to assess upstream causes and downstream impact for failing or delayed jobs. - Correlate infrastructure metrics, Spark metrics from the Spark UI, logs, and cluster configuration. - Compare multiple runs to facilitate troubleshooting, and to optimize provisioning and configuration during deployment. @@ -71,3 +75,4 @@ To determine why a stage is taking a long time to complete, you can use the {{< {{< partial name="whats-next/whats-next.html" >}} [1]: https://app.datadoghq.com/monitors/templates +[2]: /data_observability/lineage/ diff --git a/content/en/data_observability/lineage.md b/content/en/data_observability/lineage.md new file mode 100644 index 00000000000..0c459580cfc --- /dev/null +++ b/content/en/data_observability/lineage.md @@ -0,0 +1,149 @@ +--- +title: Lineage +description: Trace upstream dependencies and downstream consumers across data assets, jobs, dashboards, and applications. +further_reading: +- link: "/data_observability/data_catalog/" + tag: "Documentation" + text: "Data Catalog" +- link: "/data_observability/quality_monitoring/" + tag: "Documentation" + text: "Quality Monitoring" +- link: "/data_observability/jobs_monitoring/" + tag: "Documentation" + text: "Jobs Monitoring" +- link: "https://www.datadoghq.com/blog/data-lineage/" + tag: "Blog" + text: "Understanding data lineage" +--- + +## Overview + +Lineage shows how data flows through your stack—from source systems and warehouse tables, through transformations and jobs, to the dashboards and applications that consume it. Use it to trace quality issues to their root cause, assess the blast radius of a failing job or a planned schema change, and route incidents to the right owner. + +Datadog builds lineage automatically from metadata collected through your [Quality Monitoring][1] and [Jobs Monitoring][2] integrations (Snowflake, BigQuery, Databricks, dbt, Airflow, Fivetran, Looker, Tableau, and others). Anything in the Data Observability Catalog can appear in the graph. + +{{< img src="data_observability/lineage/lineage-overview.png" alt="The Lineage page showing upstream and downstream dependencies for an anchored Snowflake table" style="width:100%;" >}} + +To open Lineage, go to **Data Observability > Lineage**. + +## Select anchor assets + +Every lineage view centers on an **anchor**: the single asset whose upstream and downstream neighbors you want to explore. Datadog marks the anchor node with an `ANCHOR` badge. + +To set an anchor, use the search bar at the top of the page: + +1. Choose an asset type from the **Any asset** dropdown (for example, *Table*, *Column*, *Dashboard*, or *Job*). Leave it set to **Any asset** to search across all types. +2. Enter the asset name. Datadog searches all connected sources in the Data Observability Catalog. +3. Select a result to anchor the graph. + +**One Anchor** + +Search for a single asset by name to make it the anchor for the lineage graph. +{{< img src="data_observability/lineage/anchors-1-search.png" alt="The anchor search bar with one anchor selected" style="width:100%;" >}} + +The graph centers on the selected anchor and shows its upstream dependencies and downstream consumers. +{{< img src="data_observability/lineage/anchors-1-map.png" alt="The lineage map with one anchor selected" style="width:100%;" >}} + +**Multiple Anchors** + +Add multiple assets to the search bar to compare related lineage paths in the same view. +{{< img src="data_observability/lineage/anchors-2-search.png" alt="The anchor search bar with 2 anchors selected" style="width:100%;" >}} + +Each selected asset is marked with an `ANCHOR` badge, and the graph shows how their upstream and downstream paths connect. +{{< img src="data_observability/lineage/anchors-2-map.png" alt="The lineage map with 2 anchors selected" style="width:100%;" >}} + +**Search Query** + +Use an attribute query, such as `schema:staging`, to select a dynamic set of matching assets. +{{< img src="data_observability/lineage/anchors-n-search.png" alt="The anchor search bar with a dynamic query" style="width:100%;" >}} + +The graph marks every matching asset as an anchor so you can inspect lineage for the full query result set. +{{< img src="data_observability/lineage/anchors-n-map.png" alt="The lineage map with many anchors selected via a dynamic query" style="width:100%;" >}} + +The graph renders with the anchors in the center and upstream and downstream neighbors expanding to the left and right. + +## Navigate the graph + +After you set an anchor, the lineage graph renders in the main panel. Upstream dependencies appear to the left; downstream consumers appear to the right. Each node shows the asset's name, type, source, and basic stats such as row or column count where available. + +The toolbar on the right of the canvas provides **zoom in**, **zoom out**, **Reset view**, and **Center anchors**. + +The time selector in the top-right corner (`1w`, `Past 1 Week`, and so on) sets the window used to evaluate lineage. Datadog derives relationships from query history and job runs within this window: widen it to surface older or less frequent dependencies, narrow it to show only what's active. + +## Lineage Controls + +The **Lineage Controls** panel on the left configures the shape and contents of the graph. + +