-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#6566] improvement(core): Add the cache mechanism for metalake and use cache to load in-use
information.
#6569
base: main
Are you sure you want to change the base?
Conversation
@@ -60,6 +67,21 @@ public class MetalakeManager implements MetalakeDispatcher { | |||
|
|||
private final IdGenerator idGenerator; | |||
|
|||
@VisibleForTesting | |||
static final Cache<NameIdentifier, BaseMetalake> METALAKE_CACHE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our goal is to accelerate the acquisition of in-use
, it seems that we only need to cache the corresponding in-use
value, and do not need to cache BaseMetalake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cacheing BaseMetalake
will take only a little memory and can use it when loadingMetalake
by the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think caching metalake is better, because the amount of metalake is quite limited, with small memory size we can improve the performance a lot, it is worthy to cache the metalake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use uppercase for this variable? Typically, we only use uppercase letter for final
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It indeed has final
flag, see static final Cache<NameIdentifier, BaseMetalake> METALAKE_CACHE
@@ -260,7 +260,7 @@ private ModelCatalog asModels() { | |||
|
|||
private final Config config; | |||
|
|||
@VisibleForTesting final Cache<NameIdentifier, CatalogWrapper> catalogCache; | |||
@VisibleForTesting static Cache<NameIdentifier, CatalogWrapper> catalogCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we make it static? I assume there will be only one CatalogManager
, so there should be only one catalogCache
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we make it static?
The method check catalogInUse
and metalakeInUse
are all static. If we want to use cache for them, we need to change it to static
assume there will be only one CatalogManager, so there should be only one catalogCache, right?
Yes, there will be only one cache and all catalogs shares the same instance, It's not a big problem I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should named CATALOG_CACHE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be final
?
if (wrapper != null) { | ||
catalogEntity = wrapper.catalog.entity(); | ||
} else { | ||
catalogEntity = store.get(catalogIdent, EntityType.CATALOG, CatalogEntity.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put this catalogEntity
in cache? Besides, can we use loadcatalog directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading catalogEntity
and then transform it to standard CatalogEntity are NOT static
method, so I omit it. in fact, in most cases, we can get catalogEntity
from the cache as all calls are from catalog operations, and the catalog should be in the cache except the first time.
What is the performance gaining after cache? |
The picture attached to this PR is of the performance with the cache, about the picture without cache, please see the picture in corresponding issue |
in-use
information.in-use
information.
@@ -260,7 +260,7 @@ private ModelCatalog asModels() { | |||
|
|||
private final Config config; | |||
|
|||
@VisibleForTesting final Cache<NameIdentifier, CatalogWrapper> catalogCache; | |||
@VisibleForTesting static Cache<NameIdentifier, CatalogWrapper> catalogCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should named CATALOG_CACHE
?
BaseMetalake metalake = METALAKE_CACHE.getIfPresent(ident); | ||
if (metalake == null) { | ||
metalake = store.get(ident, EntityType.METALAKE, BaseMetalake.class); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not cache the result after getting it from the store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #6569 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the user alters the metalake, then load schema/table directly without list/get the metalake, the cache never be hitting, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. alter a metalake is not operated frequently, so I guess it's acceptable
@@ -253,7 +289,7 @@ public BaseMetalake alterMetalake(NameIdentifier ident, MetalakeChange... change | |||
throw new MetalakeNotInUseException( | |||
"Metalake %s is not in use, please enable it first", ident); | |||
} | |||
|
|||
METALAKE_CACHE.invalidate(ident); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the result be cached after updating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it back to cache seems to be an optional if this optional is not frequently called. Anyway, this is an improvement, let me check if we can add it back.
The variable does not have a |
What changes were proposed in this pull request?
in-use
information for metalake and catalog.This is the Flame diagram after optimization:

Why are the changes needed?
Loading
in-use
from backend storage everytime times when doing catalog operations is very time-comsuming, we'd to optimize it.Fix: #6566
Does this PR introduce any user-facing change?
N/A.
How was this patch tested?
Add UT and existing tests.