Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support simple expiring cache #75

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

yigongliu-concur
Copy link
Contributor

@yigongliu-concur yigongliu-concur commented Nov 22, 2020

Please review changes to support simple expiring cache:

  • reuse existing 2q, arc and simple lru for lru eviction, add expiring logic as a wrapper; need changes in current api Add() method to return evicted key/val.

  • since there are 3000+ packages importing golang-lru, there should be no changes required for user client code. avoid api changes that require user code change; the changes to api Add() method (single change to LRUCache interface) is introduced as optional argument:

    Add(key, val interface{}, evictedKeyVal ...*interface{}) (evicted bool)

    this is "pull" style api: user/caller code specify receiving arguments to receive results from callee, similar to Reader{ Read(recvBuf []byte)int }:

    Add(k,v) //current code, no change

    Add(k,v,&evictedKey) //interested in evicted key

    Add(k,v,&evictedKey,&evictedVal) //interested in both evicted key and val

  • following same pattern of simple LRU, separate 2q/arc Cache from LRU, so that XXXCache is just a thread safe wrapper of XXXLRU. this avoids the double locking when ExpiringCache wraps 2Q or ARC.

  • there are 2 expiring policies similar to Guava's CacheBuilder (https://guava.dev/releases/19.0/api/docs/com/google/common/cache/CacheBuilder.html): ExpireAfterWrite and ExpireAfterAccess

  • the default cleanup of expired entries is lazy, only when space is needed for new entries (inside Add) or accurate keys (inside Keys) and size (inside Len) are needed; can add background cleanup by a goroutine periodically calling RemoveAllExpired()

@jefferai
Copy link
Member

jefferai commented Dec 2, 2020

I'm not the right person to review this as I don't know what new API we might want to commit to, but a few comments:

  • If variadic arguments are being added, I would suggest an Options style approach so that it's easier to extend later if there are more type of arguments that need to be added for any other feature
  • I'm not convinced that the mechanism of getting expired entries via calls to Add and others is the right one. Of particular note/concern is that RemoveAllExpired doesn't actually return expired items even though a call to Add would. A design that wouldn't require all of these changes everywhere to support variadic arguments for this explicit purpose would be to allow a caller to register a channel on which expired items are sent. Then expiration can happen via a priority queue with a timer without the laziness.

@yigongliu-concur
Copy link
Contributor Author

  • By Options style, do you mean "functional options"? Yes, the expiring caches already use them for ExpiringCache configuration (https://github.com/yigongliu-concur/golang-lru/blob/3204fcefa37d12c223328ef8a4708cd8023188c5/expiringlru.go#L72).

  • The only change to the API is adding the variadic arguments to Add() method which has no influence on end-user code. user code can stay the same without change anything.

  • There may be some misunderstanding here. the variadic arguments in Add() method is for receiving output - returning evicted entries from underlying LRUs. Most end-users won't need it and current deployed code won't need change. Its sole purpose is to enable extending the existing LRUs, such as what ExpiringCaches are doing. Yes, RemoveAllExpired() can be changed to return expired entries, again, end-users code may not be interested in these.

  • LRUs' Add() method has to be changed, since currently Add() only returns a boolean indicating evictions happened or not (evicted entry(k/v) are silently dropped). Because ExpiringCache wraps a LRU and keeps the expiring-list for entries, it need the evicted entry (k/v) info (which is evicted by LRUs behind the scene) to update expiring-list.

  • I have implemented expire-list in slice, heap and linked-list, currently the linked list implementation perform the best for the common use case when all entries apply the same TTL timeout value. When entries use diff TTL, heap based expire-list may perform better, still need to be further verified.

  • I disagree with the idea of using channels to send entries. Channels are best used for cross goroutines communication; the use case here are returning evicted/expired entries to callers which are in the same gorouting/stack. Returning thru function result or arguments is most straight forward.

@jefferai
Copy link
Member

jefferai commented Dec 3, 2020

Hi there,

By Options style, do you mean "functional options"? Yes, the expiring caches already use them for ExpiringCache configuration (https://github.com/yigongliu-concur/golang-lru/blob/3204fcefa37d12c223328ef8a4708cd8023188c5/expiringlru.go#L72).

Yes. Adding variadic arguments for a specific purpose like this means that you cannot add more arguments after the fact, unless you do some kind of type switching, and that has various brittle edges. If it was using an options pattern instead you would simply pass a value to receive expired entries as an option, but could support other options as well in a more future-proof way.

There may be some misunderstanding here. the variadic arguments in Add() method is for receiving output - returning evicted entries from underlying LRUs. Most end-users won't need it and current deployed code won't need change. Its sole purpose is to enable extending the existing LRUs, such as what ExpiringCaches are doing. Yes, RemoveAllExpired() can be changed to return expired entries, again, end-users code may not be interested in these.
I disagree with the idea of using channels to send entries. Channels are best used for cross goroutines communication; the use case here are returning evicted/expired entries to callers which are in the same gorouting/stack. Returning thru function result or arguments is most straight forward.

I understand the purpose of the variadic arguments, but I disagree with the approach. Lazy expiration can lead to unpredictable performance/memory utilization as a call to add a value can now result in a lot more work -- work which could have been performed in advance via using a timer with a priority queue, keeping performance more consistent and allowing memory to be freed up when it's actually no longer needed. Timers run in their own goroutine, which is why using a channel to receive evicted values would make sense, although it's not the only approach.

Additionally, you're making an assumption that a caller to add a value is the same goroutine as the one that might want to handle an expired value. Considering the caches are thread-safe, that's not a good assumption. In practice, we're using this cache with software that can have tens of thousands of goroutines accessing the cache. It's just as likely that the caller to Add would end up sending the evicted entries to some other goroutine for processing anyways.

Also, maybe I missed this as I really only looked at the proposed API changes, so there might be an option here that I missed, but there seems to be no way to turn off this behavior. If a user wants an expiring cache but isn't interested in receiving evicted entries, the tradeoff for lazy expiration is even more problematic. If you have callers register (or request) a channel and close it when they no longer want to receive evictions, not only could you support sending evictions to multiple callers, but you could clean them up when the channel closes, such that if there are no active callers waiting on evictions you simply don't bother with any logic to send them around.

@yigongliu-concur
Copy link
Contributor Author

  • If lazy expiration is not preferred, we can a goroutine periodically calling RemoveAllExpired() to clean up, the clean up schedule is controlled by some timer. But inside ExpiringCache we need keep timestamp for each entry, and timeout/expire entries at beginning of Add() operations, otherwise the semantics will be incorrect. So expiration of entries are decided inside Cache, but removal can be done more aggressively from outside.

  • My assumption is that most user-code will not care about evicted/expired entries since Cache is the abstraction to hide these from user code. If user code doesn't explicitly provide pointers to key and value to Add() call, they will not be returned. So it is turned off by default.

  • Still not convinced about using channels to return evicted/expired entries. it introduces concurrency issues into the simple data structures. Right now all caches are protected by Locks. Say, 1st goroutine calls Add() method, it will first hold the writer lock, then call internal LRU to add key/val, which results in evictions, then 1st goroutine sends evicted entries to channel; the problem is how about the consumers of this channel are slow? blocked? the 1st goroutine will be blocked on channel send and it holds the writer lock, which will block any other access to the cache. I'd prefer Add() method calls return evicted values explicitly, then the callers can decide whether to send them to others.

@yigongliu-concur
Copy link
Contributor Author

Reconsider this, may be we can avoid varadic arguments of Add() (and all these troubles) by using onEvicted callback similar to what simple LRU does. will reconsider the code during night. Thanks.

@hashicorp-cla
Copy link

hashicorp-cla commented Sep 9, 2021

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants