Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: reduce the impact of the listTables method in a unified catalog on the Hive Metastore (HMS) #2986

Open
3 tasks done
Aireed opened this issue Jul 1, 2024 · 2 comments

Comments

@Aireed
Copy link
Contributor

Aireed commented Jul 1, 2024

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

  • metastore: hive
  • catalog: UnifiedCatalog
  • method: listTables

problem:

  • When calling mixed-hive/iceberg catalog listTables, it first calls the getAllTables method of HMS to retrieve all table names, and then calls getTableObjectsByName to get the Table objects of these tables, and determines whether the current table is a mixed-hive/iceberg table by checking properties or getSd().
  • The execution logic of Paimon is to first getAllTables to retrieve all tables, then use getTable to get the Table object of each table, and determine whether this table is a Paimon table.

As mentioned above, if Unified Catalog supports mixed-hive/iceberg/paimon simultaneously, it will call getTables three times, getTableObjectsByName twice (which is a relatively heavy operation), and multiple times getTable.

In addition to being accessed by the frontend to view the table list, the listTables will also be called by the logic to synchronize with the external catalog (default every 3 minutes).

How should we improve?

For the case where the metastore is Hive, we optimize by calling getAllTables and getTableObjectsByName once to retrieve all tables and their types.

  1. Define an interface that supports listing all tables and their formats.
  2. MixedCatalog implements this interface.
  3. MixedHiveCatalog implements this interface.
  4. when call UnifiedCatalog::listTables, we first check the supported FormatCatalog to see if any of them have implemented this interface. If so, we use the table list returned by it instead of calling listTables for each type of FormatCatalog.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

@Aireed
Copy link
Contributor Author

Aireed commented Jul 1, 2024

@baiyangtx @zhoujinsong WDYT?

@Aireed
Copy link
Contributor Author

Aireed commented Jul 1, 2024

The implementation is roughly like this. (The code is quite old, ArcticCatalog has been replaced with MixedHiveCatalog now).
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant