This file documents any backwards-incompatible changes in DataHub and assists people when migrating to a new version.
- #6742 The metadata file sink's output format no longer contains nested JSON strings for MCP aspects, but instead unpacks the stringified JSON into a real JSON object. The previous sink behavior can be recovered using the
legacy_nested_json_string
option. The file source is backwards compatible and supports both formats. - #6901 The
env
anddatabase_alias
fields have been marked deprecated across all sources. We recommend usingplatform_instance
where possible instead.
#6851 - Sources bigquery-legacy and bigquery-usage-legacy have been removed.
- #6243 apache-ranger authorizer is no longer the core part of DataHub GMS, and it is shifted as plugin. Please refer updated documentation Configuring Authorization with Apache Ranger for configuring
apache-ranger-plugin
in DataHub GMS. - #6243 apache-ranger authorizer as plugin is not supported in DataHub Kubernetes deployment.
- #6243 Authentication and Authorization plugins configuration are removed from application.yml. Refer documentation Migration Of Plugins From application.yml for migrating any existing custom plugins.
datahub check graph-consistency
command has been removed. It was a beta API that we had considered but decided there are better solutions for this. So removing this.graphql_url
option ofpowerbi-report-server
source deprecated as the options is not used.- #6789 BigQuery ingestion: If
enable_legacy_sharded_table_support
is set to False, sharded table names will be suffixed with _yyyymmdd to make sure they don't clash with non-sharded tables. This means if stateful ingestion is enabled then old sharded tables will be recreated with a new id and attached tags/glossary terms/etc will need to be added again. This behavior is not enabled by default yet, but will be enabled by default in a future release.
- #6611 - Snowflake
schema_pattern
now accepts pattern for fully qualified schema name in format<catalog_name>.<schema_name>
by setting configmatch_fully_qualified_names : True
. Current defaultmatch_fully_qualified_names: False
is only to maintain backward compatibility. The config optionmatch_fully_qualified_names
will be deprecated in future and the default behavior will assumematch_fully_qualified_names: True
." - #6636 - Sources
snowflake-legacy
andsnowflake-usage-legacy
have been removed.
- The beta
datahub check graph-consistency
command has been removed.
- PowerBI source:
workspace_id_pattern
is introduced in place ofworkspace_id
.workspace_id
is now deprecated and set for removal in a future version.
- LookML source will only emit views that are reachable from explores while scanning your git repo. Previous behavior can be achieved by setting
emit_reachable_views_only
to False. - LookML source will always lowercase urns for lineage edges from views to upstream tables. There is no fallback provided to previous behavior because it was inconsistent in application of lower-casing earlier.
- dbt config
node_type_pattern
which was previously deprecated has been removed. Useentities_enabled
instead to control whether to emit metadata for sources, models, seeds, tests, etc. - The dbt source will always lowercase urns for lineage edges to the underlying data platform.
- The DataHub Airflow lineage backend and plugin no longer support Airflow 1.x. You can still run DataHub ingestion in Airflow 1.x using the PythonVirtualenvOperator.
- #6570
snowflake
connector now populates created and last modified timestamps for snowflake datasets and containers. This version of snowflake connector will not work with datahub-gms version older thanv0.9.3
- We have promoted
bigquery-beta
tobigquery
. If you are usingbigquery-beta
then change your recipes to use the typebigquery
.
- Java version 11 or greater is required.
- For any of the GraphQL search queries, the input no longer supports value but instead now accepts a list of values. These values represent an OR relationship where the field value must match any of the values.
- The
getNativeUserInviteToken
andcreateNativeUserInviteToken
GraphQL endpoints have been renamed togetInviteToken
andcreateInviteToken
respectively. Additionally, both now accept an optionalroleUrn
parameter. Both endpoints also now require theMANAGE_POLICIES
privilege to execute, rather thanMANAGE_USER_CREDENTIALS
privilege. - One of the default policies shipped with DataHub (
urn:li:dataHubPolicy:7
, orAll Users - All Platform Privileges
) has been edited to no longer includeMANAGE_POLICIES
. Its name has consequently been changed toAll Users - All Platform Privileges (EXCEPT MANAGE POLICIES)
. This change was made to prevent all users from effectively acting as superusers by default.
- Browse Paths have been upgraded to a new format to align more closely with the intention of the feature. Learn more about the changes, including steps on upgrading, here: https://datahubproject.io/docs/advanced/browse-paths-upgrade
- The dbt ingestion source's
disable_dbt_node_creation
andload_schema
options have been removed. They were no longer necessary due to the recently added sibling entities functionality. - The
snowflake
source now uses newer faster implementation (earliersnowflake-beta
). Config propertiesprovision_role
andcheck_role_grants
are not supported. Oldersnowflake
andsnowflake-usage
are available assnowflake-legacy
andsnowflake-usage-legacy
sources respectively.
- [Helm] If you're using Helm, please ensure that your version of the
datahub-actions
container is bumped tov0.0.7
orhead
. This version contains changes to support running ingestion in debug mode. Previous versions are not compatible with this release. Upgrading to helm chart version0.2.103
will ensure that you have the compatible versions by default.
- Python 3.6 is no longer supported for metadata ingestion
- #5451
GMS_HOST
andGMS_PORT
environment variables deprecated inv0.8.39
have been removed. UseDATAHUB_GMS_HOST
andDATAHUB_GMS_PORT
instead. - #5478 DataHub CLI
delete
command when used with--hard
option will delete soft-deleted entities which match the other filters given. - #5471 Looker now populates
userEmail
in dashboard user usage stats. This version of looker connnector will not work with older version of datahub-gms if you haveextract_usage_history
looker config enabled. - #5529 -
ANALYTICS_ENABLED
environment variable in datahub-gms is now deprecated. UseDATAHUB_ANALYTICS_ENABLED
instead.
-
The
should_overwrite
flag incsv-enricher
has been replaced withwrite_semantics
to match the format used for other sources. See the documentation for more details -
Closing an authorization hole in creating tags adding a Platform Privilege called
Create Tags
for creating tags. This is assigned todatahub
root user, along with default All Users policy. Notice: You may need to add this privilege (orManage Tags
) to existing users that need the ability to create tags on the platform. -
#5329 Below profiling config parameters are now supported in
BigQuery
:- profiling.profile_if_updated_since_days (default=1)
- profiling.profile_table_size_limit (default=1GB)
- profiling.profile_table_row_limit (default=50000)
Set above parameters to
null
if you want older behaviour.
- #5240
lineage_client_project_id
inbigquery
source is removed. Usestorage_project_id
instead.
- Refactored the
health
field of theDataset
GraphQL Type to be of type list of HealthStatus (was type HealthStatus). See this PR for more details.
- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.
- #5208
GMS_HOST
andGMS_PORT
environment variables being set in various containers are deprecated in favour ofDATAHUB_GMS_HOST
andDATAHUB_GMS_PORT
. KAFKA_TOPIC_NAME
environment variable in datahub-mae-consumer and datahub-gms is now deprecated. UseMETADATA_AUDIT_EVENT_NAME
instead.KAFKA_MCE_TOPIC_NAME
environment variable in datahub-mce-consumer and datahub-gms is now deprecated. UseMETADATA_CHANGE_EVENT_NAME
instead.KAFKA_FMCE_TOPIC_NAME
environment variable in datahub-mce-consumer and datahub-gms is now deprecated. UseFAILED_METADATA_CHANGE_EVENT_NAME
instead.
- #5132 Profile tables in
snowflake
source only if they have been updated since configured (default:1
) number of day(s). Update the configprofiling.profile_if_updated_since_days
as per your profiling schedule or set it toNone
if you want older behaviour.
- Create & Revoke Access Tokens via the UI
- Create and Manage new users via the UI
- Improvements to Business Glossary UI
- FIX - Do not require reindexing to migrate to using the UI business glossary
- In this release we introduce a brand new Business Glossary experience. With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!
- #4961 Dropped profiling is not reported by default as that caused a lot of spurious logging in some cases. Set
profiling.report_dropped_profiles
toTrue
if you want older behaviour.
- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.
- #4644 Remove
database
option fromsnowflake
source which was deprecated sincev0.8.5
- #4595 Rename confusing config
report_upstream_lineage
toupstream_lineage_in_report
insnowflake
connector which was added in0.8.32
- #4644
host_port
option ofsnowflake
andsnowflake-usage
sources deprecated as the name was confusing. Useaccount_id
option instead.
- #4760
check_role_grants
option was added insnowflake
to disable checking roles insnowflake
as some people were reporting long run times when checking roles.