Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Caching review and thoughts #3377

Conversation

jaikumar-b
Copy link

@jaikumar-b jaikumar-b commented May 9, 2024

Howdy folks!

The topic of hybrid caching has been bought up multiple times and seems like you guys don't have it on your roadmap anytime soon. Gonna take a stab at this and wanted get some feedback on an idea that can help folks interested in this to get a workable solution until you guys come up with a first class support for caching

#3275
#1467

Background

I would like to create an API at the query level that developers can use to define the storage policy of the query when initializing an instance like so

let query = HeroQuery(fetchPolicy: .returnCacheAndFetch).storagePolicy(.inMemoryAndDisk) // or .allowed like https://developer.apple.com/documentation/foundation/urlcache/storagepolicy

This is different than the Kotlin version of Apollo which uses cache control headers but I somehow prefer this over the Kotlin api if the storage policy is exclusively defined on the client side.

One hurdle in implementation of a custom normalized cache based on the policy defined by query is that there is no way to associate normalized cache methods with the query's storage policy so that the custom normalized cache implementation can make decisions according to the policies.

Proposal

Noticed that we have a contextIdentifier which can be instantiated for each fetch operation, propagating that identifier all the way to the caching layer allows for extending the functionality of existing InMemory and SQLite implementations Apollo has by creating some wrapper classes. These wrapper classes can have a StoragePolicyResolver which maintains the policy vs identifier in memory till the request is complete from network chain.

I was able to create a NormalizedCacheChain object that accepts implementations of NormalizedCache.

guard let sqliteCache = SQLiteNormalizedCacheFactory().makeNormalizedCache(name: "apollo_db.sqlite") else { return InMemoryNormalizedCache() }
            return NormalizedCacheChain(normalizedCaches: [
                    InMemoryNormalizedChainCache(),
                    SQLiteChainedNormalizedCache(sqlite: sqliteCache)
            ])

Probably can extend more cache implementations as needed by the application.

Questions

  • Any feedback on the solution?
  • Would this be considered something that we can add to the sdk, considering its non-breaking

NOTE -
Have this built out for older versions due to familiarity, seems like can be done for 1.x versions as well

@apollo-cla
Copy link

@jaikumar-b: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/

@jaikumar-b jaikumar-b closed this May 9, 2024
@jaikumar-b
Copy link
Author

Closed and created an issue instead #3378

Copy link
Contributor

@AnthonyMDev AnthonyMDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still struggling to understand what value you're getting out of this new identifier field in the cache read/writes. I'm concerned about how the cache will know how to handle manual cache writes that are not associated with a fetch request, since those won't have an identifier. Will your chained caching mechanisms break if data is mutated in the cache without an identifier?

I'm also feeling that we might not need to actually use the identifier when reading data, but only when writing it. When writing data, you use your query's caching policy to determine if you want to write it to only an in-memory store, vs in-memory & disk persisted. But reading should go through the chain and look for data in each cache. Unless you are trying to use knowledge based on the identifier to skip looking in one cache if you know the data won't be there? Which to me seems like it could be done within your cache chain implementation without needing to ever pass the identifier through the entire cache reading process. Does that sound right to you? I may be missing some things because I'm not 100% clear on what you're trying to achieve here yet.


In 1.0, our fetch functions have a requestContext parameter that allows you to pass in an arbitrary type here. Rather than just using the UUID, that might be a better approach, as it allows you to expose any additional stored information you might need in that object.

@@ -7,14 +7,14 @@ public protocol NormalizedCache {
/// - Parameters:
/// - key: The cache keys to load data for
/// - Returns: A dictionary of cache keys to records containing the records that have been found.
func loadRecords(forKeys keys: Set<CacheKey>) throws -> [CacheKey: Record]

func loadRecords(forKeys keys: Set<CacheKey>, identifier: UUID?) throws -> [CacheKey: Record]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why you need the identifier added to the loadRecords method. Neither of the implementations in this PR seem to be even using that parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to use it in the custom implementation, but as pointed out, will mostly not need this and will remove.

if let cachedResult = cache[key] {
return try cachedResult.get()
}

assert(pendingLoads.contains(key))

let values = try batchLoad(pendingLoads)

let values = try batchLoad(pendingLoads, identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem here. You could theoretically use the subscript to add pendingLoads with one identifier, and then call load with a different identifier, which would then run the batch load the the second identifier, even for the pendingLoads that should use the other identifier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh good catch, although I was able to fix this and get all the tests to pass by separating identifier loads vs non identifier loads by hashing the identifier with the cache key internal to the loader, kinda started to question whether this was all really needed when reading the cache as you pointed out.

@jaikumar-b
Copy link
Author

jaikumar-b commented May 13, 2024

I think I'm still struggling to understand what value you're getting out of this new identifier field in the cache read/writes. I'm concerned about how the cache will know how to handle manual cache writes that are not associated with a fetch request, since those won't have an identifier. Will your chained caching mechanisms break if data is mutated in the cache without an identifier?

I'm also feeling that we might not need to actually use the identifier when reading data, but only when writing it. When writing data, you use your query's caching policy to determine if you want to write it to only an in-memory store, vs in-memory & disk persisted. But reading should go through the chain and look for data in each cache. Unless you are trying to use knowledge based on the identifier to skip looking in one cache if you know the data won't be there? Which to me seems like it could be done within your cache chain implementation without needing to ever pass the identifier through the entire cache reading process. Does that sound right to you? I may be missing some things because I'm not 100% clear on what you're trying to achieve here yet.

In 1.0, our fetch functions have a requestContext parameter that allows you to pass in an arbitrary type here. Rather than just using the UUID, that might be a better approach, as it allows you to expose any additional stored information you might need in that object.

  • Without the identifier, was planning to write to the entire chain when direct access to cache was use, so this would not break just result in duplicated entries in all caches in the chain even where the data was not supposed to be written. Mostly because the at load had checks in the wrapper implementations of InMemory and Sqlite normalized cache to only load from one or other or both based on the policy. If an entry was found in both then prefer the first based on the order in which the chain was defined.
  • I did have some use cases like having a storage policy that fallback to sqlite when offline and prefers in memory when online. But I can change that at the query level as well. So it isn't a dealbreaker. It just felt odd as an API to pass an identifier to merge but not load hence from an API consistency standpoint, added it to get thoughts from the Apollo team. I do agree that on read, it's not really needed.
  • Also realize, that I should've added the custom cache implementation for a better understanding of what I'm trying to do. Will do that in the next PR.
  • Had a quick glance at requestContext and does seem like that can do the job and can be more readable than an identifier forsure. Will fiddle with it and get back.
  • Thanks again for the quick review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants