You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are seeing the assessment misclassifying tables in the external hive metastore as managed, when they are external hive and parquet. You can use the location and have spark.read, and register the table directly as parquet and are able to select from it. The code does not process any of the tables and provides this output. WARN [d.l.u.hive_metastore.table_migrate][migrate_tables_0] failed-to-migrate: SYNC command failed to migrate table hive_metastore.<table name redacted> to <catalog redacted>.<schema redacted>.<tablename redacted>. Status code: NOT_EXTERNAL. Description: [UPGRADE_NOT_SUPPORTED.NOT_EXTERNAL] Table is not eligible for upgrade from Hive Metastore to Unity Catalog. Reason: Not an external table. SQLSTATE: 0AKUC
Expected Behavior
Maybe this is the intent of the migrate hive serde in place experimental, but I would expect instead of using the SYNC command, we would instead register the table as parquet in its existing location or maybe I don't fully understand the ability of SYNC. Also, since it is in an external hive metastore, I'm not sure how it is being classified as managed.
Steps To Reproduce
No response
Cloud
AWS
Operating System
macOS
Version
latest via Databricks CLI
Relevant log output
WARN [d.l.u.hive_metastore.table_migrate][migrate_tables_0] failed-to-migrate: SYNC command failed to migrate table hive_metastore.<table name redacted> to <catalog redacted>.<schema redacted>.<tablename redacted>. Status code: NOT_EXTERNAL. Description: [UPGRADE_NOT_SUPPORTED.NOT_EXTERNAL] Table is not eligible for upgrade from Hive Metastore to Unity Catalog. Reason: Not an external table. SQLSTATE: 0AKUC
21:47:08 INFO [d.l.u.hive_metastore.table_migrate][convert_tables_3] Changing HMS managed table <table_name> to External Table type.
21:47:09 WARN [d.l.u.hive_metastore.table_migrate][convert_tables_0] Error converting HMS table <table_name> to external: An error occurred while calling None.org.apache.spark.sql.catalyst.catalog.CatalogTable. Trace:
py4j.Py4JException: Constructor org.apache.spark.sql.catalyst.catalog.CatalogTable([class org.apache.spark.sql.catalyst.TableIdentifier, class org.apache.spark.sql.catalyst.catalog.CatalogTableType, class org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat, class org.apache.spark.sql.types.StructType, class scala.Some, class scala.collection.immutable.Nil$, class scala.None$, class java.lang.String, class java.lang.Long, class java.lang.Integer, class java.lang.String, class scala.collection.immutable.Map$EmptyMap$, class scala.None$, class scala.None$, class scala.None$, class scala.collection.mutable.ArrayBuffer, class java.lang.Boolean, class java.lang.Boolean, class scala.collection.immutable.Map$EmptyMap$, class scala.None$]) does not exist
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:203)
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:220)
at py4j.Gateway.invoke(Gateway.java:255)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)
: Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/databricks/labs/ucx/hive_metastore/table_migrate.py", line 302, in _convert_hms_table_to_external
new_table = self._catalog_table(
^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1620, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 263, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 330, in get_return_value
raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.sql.catalyst.catalog.CatalogTable. Trace:
py4j.Py4JException: Constructor org.apache.spark.sql.catalyst.catalog.CatalogTable([class org.apache.spark.sql.catalyst.TableIdentifier, class org.apache.spark.sql.catalyst.catalog.CatalogTableType, class org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat, class org.apache.spark.sql.types.StructType, class scala.Some, class scala.collection.immutable.Nil$, class scala.None$, class java.lang.String, class java.lang.Long, class java.lang.Integer, class java.lang.String, class scala.collection.immutable.Map$EmptyMap$, class scala.None$, class scala.None$, class scala.None$, class scala.collection.mutable.ArrayBuffer, class java.lang.Boolean, class java.lang.Boolean, class scala.collection.immutable.Map$EmptyMap$, class scala.None$]) does not exist
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:203)
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:220)
at py4j.Gateway.invoke(Gateway.java:255)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)
da-dbx-support
Ben Tibbetts, Scott Parker, and 2 others
Ben Tibbetts
Tuesday at 5:13 PM
Hello! I am working on the issue where our mapping file is too big to process within the migrate_tables workflow. I split up our mapping file, unpacked and modified the python package so that it could load multiple files, pointed to that new package within the convert_managed_table task.
Everything was working fine up until this error forevery table (Putting stacktracein thread).
Does that look familiar at all/who could help me look at this? (edited)
Ben Tibbetts
Tuesday at 5:15 PM
21:47:08 INFO [d.l.u.hive_metastore.table_migrate][convert_tables_3] Changing HMS managed table <table_name> to External Table type.
21:47:09 WARN [d.l.u.hive_metastore.table_migrate][convert_tables_0] Error converting HMS table <table_name> to external: An error occurred while calling None.org.apache.spark.sql.catalyst.catalog.CatalogTable. Trace:
py4j.Py4JException: Constructor org.apache.spark.sql.catalyst.catalog.CatalogTable([class org.apache.spark.sql.catalyst.TableIdentifier, class org.apache.spark.sql.catalyst.catalog.CatalogTableType, class org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat, class org.apache.spark.sql.types.StructType, class scala.Some, class scala.collection.immutable.Nil$, class scala.None$, class java.lang.String, class java.lang.Long, class java.lang.Integer, class java.lang.String, class scala.collection.immutable.Map$EmptyMap$, class scala.None$, class scala.None$, class scala.None$, class scala.collection.mutable.ArrayBuffer, class java.lang.Boolean, class java.lang.Boolean, class scala.collection.immutable.Map$EmptyMap$, class scala.None$]) does not exist
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:203)
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:220)
at py4j.Gateway.invoke(Gateway.java:255)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)
: Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/databricks/labs/ucx/hive_metastore/table_migrate.py", line 302, in _convert_hms_table_to_external
new_table = self._catalog_table(
^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1620, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 263, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 330, in get_return_value
raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.sql.catalyst.catalog.CatalogTable. Trace:
py4j.Py4JException: Constructor org.apache.spark.sql.catalyst.catalog.CatalogTable([class org.apache.spark.sql.catalyst.TableIdentifier, class org.apache.spark.sql.catalyst.catalog.CatalogTableType, class org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat, class org.apache.spark.sql.types.StructType, class scala.Some, class scala.collection.immutable.Nil$, class scala.None$, class java.lang.String, class java.lang.Long, class java.lang.Integer, class java.lang.String, class scala.collection.immutable.Map$EmptyMap$, class scala.None$, class scala.None$, class scala.None$, class scala.collection.mutable.ArrayBuffer, class java.lang.Boolean, class java.lang.Boolean, class scala.collection.immutable.Map$EmptyMap$, class scala.None$]) does not exist
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:203)
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:220)
at py4j.Gateway.invoke(Gateway.java:255)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:840)
Scott Parker
Yesterday at 8:59 AM
@sheila.stewart
@cary.moore
:point_up:
:eyes:
1
Cary Moore
Yesterday at 9:39 AM
Hi
@ben.tibbetts
what is the code calling to read the parquet?
Cary Moore
Yesterday at 9:43 AM
Based on the context provided, here are some potential causes and solutions:
Table Type Issue: Managed tables on DBFS mounts can sometimes face issues during migration if they are not correctly identified as external tables. This can be resolved by updating the table typein the Hive Metastore to external before attempting the migration. You can do this by setting the tableType to CatalogTableType.EXTERNAL in your code.
Configuration Settings: Ensure that the necessary configuration settings are enabled. For instance, setting spark.databricks.sync.command.enableManagedTable=true can helpin enabling the migration of managed tables using the SYNC command.
Unsupported Operations: There are certain restrictions with UC enabled clusters. Converting managed tables to external tables directly might not be supported. In such cases, using a deep clone forHMS Parquet and Delta tables to copy the data and then migrating the tablesin UC from HMS is recommended.
Compatibility Issues: Ensure that you are using compatible versions of libraries and Databricks Runtime. Mismatches in versions can lead to errors like NoSuchMethodError.
To troubleshoot further, you can:
Verify that all required elements are present in the CatalogTablePartition class.
Check forany null values being accessedin the CatalogTable class.
Ensure that you are using compatible versions of libraries, especially if you are using third-party libraries like Iceberg with Databricks Runtime.
If the issue persists, providing more details about the specific context, such as the exact code being used and the Databricks Runtime version, can helpin diagnosing the problem more accurately. (edited)
Cary Moore
Yesterday at 9:44 AM
From our internal docs
Ben Tibbetts
Yesterday at 9:44 AM
Here is the snippet that is failing:
@cached_property
def _catalog_table(self):
return self._spark._jvm.org.apache.spark.sql.catalyst.catalog.CatalogTable # pylint: disable=protected-access
def _convert_hms_table_to_external(self, src_table: Table):
logger.info(f"Changing HMS managed table {src_table.name} to External Table type.")
inventory_table = self._tables_crawler.full_name
try:
database = self._spark._jvm.scala.Some(src_table.database) # pylint: disable=protected-access
table_identifier = self._table_identifier(src_table.name, database)
old_table = self._catalog.getTableMetadata(table_identifier)
new_table = self._catalog_table(
old_table.identifier(),
self._catalog_type('EXTERNAL'),
old_table.storage(),
old_table.schema(),
old_table.provider(),
old_table.partitionColumnNames(),
old_table.bucketSpec(),
old_table.owner(),
old_table.createTime(),
old_table.lastAccessTime(),
old_table.createVersion(),
old_table.properties(),
old_table.stats(),
old_table.viewText(),
old_table.comment(),
old_table.unsupportedFeatures(),
old_table.tracksPartitionsInCatalog(),
old_table.schemaPreservesCase(),
old_table.ignoredProperties(),
old_table.viewOriginalText(),
)
self._catalog.alterTable(new_table)
self._update_table_status(src_table, inventory_table)
logger.info(f"Converted {src_table.name} to External Table type.")
except Exception as e: # pylint: disable=broad-exception-caught
logger.warning(f"Error converting HMS table {src_table.name} to external: {e}", exc_info=True)
return False
return True
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Current Behavior
We are seeing the assessment misclassifying tables in the external hive metastore as managed, when they are external hive and parquet. You can use the location and have spark.read, and register the table directly as parquet and are able to select from it. The code does not process any of the tables and provides this output.
WARN [d.l.u.hive_metastore.table_migrate][migrate_tables_0] failed-to-migrate: SYNC command failed to migrate table hive_metastore.<table name redacted> to <catalog redacted>.<schema redacted>.<tablename redacted>. Status code: NOT_EXTERNAL. Description: [UPGRADE_NOT_SUPPORTED.NOT_EXTERNAL] Table is not eligible for upgrade from Hive Metastore to Unity Catalog. Reason: Not an external table. SQLSTATE: 0AKUC
Expected Behavior
Maybe this is the intent of the migrate hive serde in place experimental, but I would expect instead of using the SYNC command, we would instead register the table as parquet in its existing location or maybe I don't fully understand the ability of SYNC. Also, since it is in an external hive metastore, I'm not sure how it is being classified as managed.
Steps To Reproduce
No response
Cloud
AWS
Operating System
macOS
Version
latest via Databricks CLI
Relevant log output
The text was updated successfully, but these errors were encountered: