[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load `in-use` information. #6569

yuqi1129 · 2025-02-27T12:04:08Z

What changes were proposed in this pull request?

Add cache mechanism for MatalakeManage.
Use cache to load in-use information for metalake and catalog.

This is the Flame diagram after optimization:

Why are the changes needed?

Loading in-use from backend storage everytime times when doing catalog operations is very time-comsuming, we'd to optimize it.

Fix: #6566

Does this PR introduce any user-facing change?

N/A.

How was this patch tested?

Add UT and existing tests.

…nformation

mchades · 2025-02-28T06:31:08Z

core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java

@@ -60,6 +67,21 @@ public class MetalakeManager implements MetalakeDispatcher {

  private final IdGenerator idGenerator;

+  @VisibleForTesting
+  static final Cache<NameIdentifier, BaseMetalake> METALAKE_CACHE =


If our goal is to accelerate the acquisition of in-use, it seems that we only need to cache the corresponding in-use value, and do not need to cache BaseMetalake.

Cacheing BaseMetalake will take only a little memory and can use it when loadingMetalake by the way.

I think caching metalake is better, because the amount of metalake is quite limited, with small memory size we can improve the performance a lot, it is worthy to cache the metalake.

Why do you use uppercase for this variable? Typically, we only use uppercase letter for final variable.

It indeed has final flag, see static final Cache<NameIdentifier, BaseMetalake> METALAKE_CACHE

jerryshao · 2025-02-28T06:52:31Z

core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java

@@ -260,7 +260,7 @@ private ModelCatalog asModels() {

  private final Config config;

-  @VisibleForTesting final Cache<NameIdentifier, CatalogWrapper> catalogCache;
+  @VisibleForTesting static Cache<NameIdentifier, CatalogWrapper> catalogCache;


Why do we make it static? I assume there will be only one CatalogManager, so there should be only one catalogCache, right?

Why do we make it static?

The method check catalogInUse and metalakeInUse are all static. If we want to use cache for them, we need to change it to static

assume there will be only one CatalogManager, so there should be only one catalogCache, right?

Yes, there will be only one cache and all catalogs shares the same instance, It's not a big problem I think.

should named CATALOG_CACHE?

should it be final?

jerryshao · 2025-02-28T06:57:06Z

core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java

+      if (wrapper != null) {
+        catalogEntity = wrapper.catalog.entity();
+      } else {
+        catalogEntity = store.get(catalogIdent, EntityType.CATALOG, CatalogEntity.class);


Shall we put this catalogEntity in cache? Besides, can we use loadcatalog directly?

Loading catalogEntity and then transform it to standard CatalogEntity are NOT static method, so I omit it. in fact, in most cases, we can get catalogEntity from the cache as all calls are from catalog operations, and the catalog should be in the cache except the first time.

jerryshao · 2025-02-28T06:59:17Z

What is the performance gaining after cache?

yuqi1129 · 2025-02-28T07:03:30Z

What is the performance gaining after cache?

The picture attached to this PR is of the performance with the cache, about the picture without cache, please see the picture in corresponding issue

mchades · 2025-02-28T12:22:33Z

core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java

@@ -260,7 +260,7 @@ private ModelCatalog asModels() {

  private final Config config;

-  @VisibleForTesting final Cache<NameIdentifier, CatalogWrapper> catalogCache;
+  @VisibleForTesting static Cache<NameIdentifier, CatalogWrapper> catalogCache;


should named CATALOG_CACHE?

mchades · 2025-02-28T12:28:36Z

core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java

+      BaseMetalake metalake = METALAKE_CACHE.getIfPresent(ident);
+      if (metalake == null) {
+        metalake = store.get(ident, EntityType.METALAKE, BaseMetalake.class);
+      }


why not cache the result after getting it from the store?

Please see #6569 (comment)

after the user alters the metalake, then load schema/table directly without list/get the metalake, the cache never be hitting, right?

Yeah. alter a metalake is not operated frequently, so I guess it's acceptable

mchades · 2025-02-28T12:30:40Z

core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java

@@ -253,7 +289,7 @@ public BaseMetalake alterMetalake(NameIdentifier ident, MetalakeChange... change
              throw new MetalakeNotInUseException(
                  "Metalake %s is not in use, please enable it first", ident);
            }
-
+            METALAKE_CACHE.invalidate(ident);


Can the result be cached after updating?

Put it back to cache seems to be an optional if this optional is not frequently called. Anyway, this is an improvement, let me check if we can add it back.

yuqi1129 · 2025-02-28T13:21:03Z

should named CATALOG_CACHE?

The variable does not have a final flag and only has a static flag; accordingly to Java specs, it seems that we still need to use camel-case sytle.

Add the cache mechanism for matalake and use cache to load in-use i…

b6e7c82

…nformation

yuqi1129 requested review from mchades and jerryshao and removed request for mchades February 27, 2025 12:04

yuqi1129 self-assigned this Feb 27, 2025

yuqi1129 added 2 commits February 27, 2025 20:43

fix ut

0e1c41f

Preload all metalakes to cache.

302d241

mchades reviewed Feb 28, 2025

View reviewed changes

jerryshao reviewed Feb 28, 2025

View reviewed changes

jerqi changed the title ~~[#6566] improvement(core): Add the cache mechanism for matalake and use cache to load in-use information.~~ [#6566] improvement(core): Add the cache mechanism for metalake and use cache to load in-use information. Feb 28, 2025

yuqi1129 added 2 commits February 28, 2025 19:39

Merge branch 'main' of github.com:datastrato/graviton into issue_6566

bade110

fix ci error.

1d181a8

mchades reviewed Feb 28, 2025

View reviewed changes

yuqi1129 added 2 commits February 28, 2025 21:26

fix

9e11e3e

fix

aa8aa80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load `in-use` information. #6569

[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load `in-use` information. #6569

yuqi1129 commented Feb 27, 2025 •

edited

Loading

mchades Feb 28, 2025

yuqi1129 Feb 28, 2025

jerryshao Feb 28, 2025

jerryshao Feb 28, 2025

yuqi1129 Feb 28, 2025

jerryshao Feb 28, 2025

yuqi1129 Feb 28, 2025

mchades Feb 28, 2025

mchades Feb 28, 2025

jerryshao Feb 28, 2025

yuqi1129 Feb 28, 2025

jerryshao commented Feb 28, 2025

yuqi1129 commented Feb 28, 2025 •

edited

Loading

mchades Feb 28, 2025

mchades Feb 28, 2025

yuqi1129 Feb 28, 2025

mchades Feb 28, 2025

yuqi1129 Feb 28, 2025

mchades Feb 28, 2025

yuqi1129 Feb 28, 2025

yuqi1129 commented Feb 28, 2025

[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load in-use information. #6569

Are you sure you want to change the base?

[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load in-use information. #6569

Conversation

yuqi1129 commented Feb 27, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao commented Feb 28, 2025

yuqi1129 commented Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi1129 commented Feb 28, 2025

[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load `in-use` information. #6569

[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load `in-use` information. #6569

yuqi1129 commented Feb 27, 2025 •

edited

Loading

yuqi1129 commented Feb 28, 2025 •

edited

Loading