-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
As part of #19366, @BlakeOrth added a session (global) cache for the results of calling LIST on a remote directory.
This has a benefit that now this cache is visible, and we can report on it, and make it more aligned with the other session scoped caches. However, it has an unfortunate side effect, namely that there is no good way to force a refresh of the files that back an external table.
For example:
-- calls LIST to get the files
create external table foo...
drop table foo;
-- reuses the cached file list, but previously would actually call LIST to get a (potentially updated) version of the file.
create external table foo ...Previously, the cache was local to each ListingTable and thus was recreated on each call to CREATE EXTERNAL TABLE. This means that a user could force a refresh of the file list by recreating the table. Reusing the same cached list I think is pretty confusing.
@jizezhang has helpfully volunteered to help with this feature and has another PR queued up so we merged #19366 and will fix this after the fact
Describe the solution you'd like
I think the caches should still be (logically) table scoped -- so that when a CREATE EXTERNAL TABLE command is issues, it will actually make a call to LIST to see the current contents of the remote table
Describe alternatives you've considered
One idea is to have "sub caches" or something that are all table scoped but that the session level cache has a handle to (so it can report on them)
@BlakeOrth also proposed:
it would potentially be reasonable to treat a DROP command like INSERT where we manually invalidate the cache entries for that table's path.
Additional context
No response