Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the LRU cache #2500

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

MathieuDutSik
Copy link
Contributor

@MathieuDutSik MathieuDutSik commented Sep 16, 2024

Motivation

The LRU is currently caching only the read_value operations. This leaves some optimizations away.

Proposal

The following was done:

  • The map was renamed as map_value.
  • A map for find is introduced keeping track of the find_keys_by_prefix and find_key_values_by_prefix.
  • For the find_keys_by_prefix we keep the map as suffix closed and we use the existing entries to get some value. So, if we already computed for prefix [0] then we can deduce the result for [0,2].
  • Similarly for find_key_values_by_prefix.
  • Write batch tries to update as much as possible the cache.
  • The max_cache_size which before was a maximum on the number of entries becomes a maximum on the number of bytes in the cache (and is expanded from 1000 to 10000000).
  • The maximum number of entries in the cache is set to 1000.

Measurement setup:

  • Done with the test_wasm_end_to_end_amm on remote-net.
  • Storage used is ScyllaDb which has LRU caching.
  • Code compiled in --release which takes about 1H to compile.
  • Done 10 times with the 1st entry discarded.
  • Scripting available at https://github.com/MathieuDutSik/linera_parsing_tools

Averaged results:

  • Cache hit. Old code 82.9%, new code: 84.8% for values (contains_key and read_value), 91% for finds (find_keys_by_prefix and find_key_values_by_prefix).
  • read_value_bytes. 0.0276ms (Old code) vs 0.019 ms (New code).
  • contains_key: 0.25 ms (Old code) vs 0.076 ms (New code)
  • find_keys_by_prefix: 1.17 ms (Old code) vs 0.087 ms (New code)
  • find_key_values_by_prefix: 1.08 ms (Old code) vs 0.13 ms (New code)
  • write_batch: 2.22 ms (Old code) vs 2.52 ms (New code)
  • total runtime: 16.63 second (Old code) vs 15.83 seconds (New code)

Test Plan

The CI.

Release Plan

Not relevant.

Links

@MathieuDutSik MathieuDutSik marked this pull request as ready for review September 16, 2024 12:50
linera-views/src/common.rs Outdated Show resolved Hide resolved
Comment on lines 63 to 64
map_find_key_values: BTreeMap<Vec<u8>, Vec<(Vec<u8>, Vec<u8>)>>,
map_find_keys: BTreeMap<Vec<u8>, Vec<Vec<u8>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a good idea to cache these, especially as a single entries, since they can contain any number of keys or key-value pairs, so their size is unbounded.

We could not cache the find_ results at all.

Or we could only store the fact that find_keys or find_key_values was called with a particular prefix at all, and then cache the actual entries in the other map(s). Whenever an entry belonging to a prefix falls out of the LRU cache or gets deleted, that prefix would have to be cleared here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size argument applies just as well to the read_valueLRU cache. Sure, the DynamoDb has some limit like 400K, but with the value splitting that limit is passed over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check so that if the cache entry being inserted is larger than the maximal size, then it is refused since that would lead to total clearing of the cache.

Copy link
Contributor

@ma2bd ma2bd Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MathieuDutSik @afck Let's keep the discussion going and make sure that we go for the best solution here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view the current approach is still not ideal, because:

  • It potentially duplicates entries, if the same value occurs in both a "find" and a "value" result.
  • It creates weird semantics for moving "find" results to the front of a queue based on a "value" cache hit.

I think it would be better to do something like this:

  • Whenever we handle a "find" query we put all found entries into the cache as individual "value", and in addition make a note that the cache now contains everything in that range.
  • Whenever an entry falls out of the cache we remove (or even better: split or shrink) any such range entry, because now the cache doesn't contain everything under it anymore.
  • If we get a "find" query for a range contained in an up-to-date one, we can bump all entries in the smaller range to the front of the queue, i.e. only those that we actually return.

linera-views/src/backends/lru_caching.rs Outdated Show resolved Hide resolved
if let Some(key_prefix) = lower_bound {
self.map_find_keys.remove(&key_prefix);
let full_prefix = (key_prefix, CacheEntry::FindKeys);
self.queue.remove(&full_prefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to remove this entry? If we read a value in that range, it doesn't make the result invalid, does it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed.

The thing is that it was an update to the LRU cache after the write to the cache.

linera-views/src/backends/lru_caching.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@ma2bd ma2bd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this but the test plan should include empirical evidence that this PR makes things faster, not slower. Also how we know this is a low-hanging fruit?

@MathieuDutSik
Copy link
Contributor Author

Thanks for looking into this but the test plan should include empirical evidence that this PR makes things faster, not slower. Also how we know this is a low-hanging fruit?

Data will be provided about the PR. I think it is a low-hanging fruit because contains_key operations were slower than read-value operation in some contexts.

linera-client/src/client_options.rs Outdated Show resolved Hide resolved
linera-indexer/lib/src/rocks_db.rs Outdated Show resolved Hide resolved
linera-indexer/lib/src/scylla_db.rs Outdated Show resolved Hide resolved
linera-service/src/proxy.rs Outdated Show resolved Hide resolved
linera-service/src/server.rs Outdated Show resolved Hide resolved
linera-views/src/backends/lru_caching.rs Outdated Show resolved Hide resolved
linera-views/src/backends/lru_caching.rs Outdated Show resolved Hide resolved
self.remove_from_queue(&full_prefix);
}
btree_map::Entry::Vacant(entry) => {
entry.insert(cache_entry);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, we are not removing any map_value entries that are redundant now? So those would be duplicated in memory.

On the other hand, if we do delete them, their lifetime in the cache is now tied to the lifetime of the whole collection.

This is why I'm not convinced by the whole idea of having entire find_key_values results as a single entry in the LRU cache.

linera-views/src/backends/lru_caching.rs Outdated Show resolved Hide resolved
return cache_entry.get_read_value(key_red);
}
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this bump the entry back to the front of the queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it has been corrected.

@ma2bd ma2bd self-requested a review September 23, 2024 14:20
Comment on lines 391 to 395
if cache_size > self.lru_cache_policy.max_cache_size {
// Inserting that entry would lead to complete clearing of the cache
// which is counter productive
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it common practice to in general not cache objects that are too big? IMHO I think that no cached item should be bigger than probably like 10% (being generous) of the total size of the cache. I don't think only checking for objects that are bigger than the full size of the cache is enough.
If the objects happen to be too big for the size of the cache (even if they're smaller than the max cache size), in the worse case the cache will be useless, as every new insertion will basically evict the entire existing cache.
And do we know how big what we're caching here will generally be? It might be worth having a metric for it, so we can adjust the size of the cache accordingly if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of heuristics about that. As I said, there is a huge literature on caching.
I could introduce the 10%, but the problem is that this becomes a parameter to add to the function call.

Adding the metrics has no issue though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why will it become a parameter do add to the function call? Can't you just add like a max_entry_size_pct or something (with a comment explaining what it is better) to the cache policy?
Also, even if it meant adding a parameter, I think it's better to add a parameter than to have a cache that might not add any value in the worse case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a control value max_entry_size in the code.

@MathieuDutSik MathieuDutSik force-pushed the lru_improvement branch 2 times, most recently from 8df2a72 to 09b5155 Compare October 3, 2024 07:15
@MathieuDutSik MathieuDutSik force-pushed the lru_improvement branch 3 times, most recently from b6d9c7b to 02b0aff Compare October 23, 2024 12:54
@MathieuDutSik MathieuDutSik requested a review from deuszx October 23, 2024 13:05
/// The maximal number of entries in the storage cache.
#[arg(long, default_value = "1000")]
cache_size: usize,
/// The storage cache policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will generate a CLI w/o any helpful tips about what values are acceptable here. We should:

  1. Improve the doc
  2. and/or add a newtype wrapper for the storage_cache_policy with useful defaults.

/// If the number of entries in the cache is too large then the underlying maps
/// become the limiting factor
pub const DEFAULT_STORAGE_CACHE_POLICY: StorageCachePolicy = StorageCachePolicy {
max_cache_size: 10000000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a power of 2? I'd expect this to be something like 5MB for example.

pub const DEFAULT_STORAGE_CACHE_POLICY: StorageCachePolicy = StorageCachePolicy {
max_cache_size: 10000000,
max_entry_size: 1000000,
max_cache_entries: 1000,
Copy link
Contributor

@deuszx deuszx Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kindof a redundant with max_cache_size, isn't it? i.e. both limit the size of a cache but in different ways. I think it's possible to set them in a "incompatible" way. For example, with thisDEFAULT, we have 1 entry having (approximately) ~10Kb. Which is probably a lot 🤔

Copy link
Contributor Author

@MathieuDutSik MathieuDutSik Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need several ways to control the cache. They are not all independent of course.
In the chosen settings, we have at most 1M for an individual entry, at most 10M for the full cache and at most 1000 entries in the cache. Of course, if we have N entries and each entry has at most M bytes then the total size is going to be at most N*M.

This is needed. If we have too many entries in the cache then we have a the risk of a cache containing 1 million entries each of a few bytes. That leads to a BTreeMap with 1 million entries and the runtime can become large. We do not want to be in that situation.

We also do not want to be in a situation where the memory expense of the cache is too high. And limiting the maximum size of a cache entry allows us to avoid one big entry crowding out all others.

pub fn read_storage_cache_policy(storage_cache_policy: Option<String>) -> StorageCachePolicy {
match storage_cache_policy {
None => DEFAULT_STORAGE_CACHE_POLICY,
Some(storage_cache_policy) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allow for configuring the caching policy by simply expecting --max-cache-size <bytes> , --max-cache-entries <number> , etc to the binary? Reading file with these three written down as a JSON seems like an unnecessary complication 🤔

if you decide to promote them to binary arguments, you can even decide to make max_cache_size and max_cache_entries mutually exclusive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally want to switch to the use of too many command line entries and switch to the use of input in files.

]),
)
.expect("Histogram can be created")
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many things here tagged with cfg(with_metrics) macro. Maybe extract them to a file and put the cfg(with_metrics) on the whole module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is annoying.
The solution I found was to use the same strategy as in linera-core/src/cloient/mod.rs of putting the contents in a "mod metrics" submodule.

@@ -1,45 +1,305 @@
// Copyright (c) Zefchain Labs, Inc.
// SPDX-License-Identifier: Apache-2.0

//! Add LRU (least recently used) caching to a given store.
//! Add caching to a given store.
Copy link
Contributor

@deuszx deuszx Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for code organisation – I'd split this code into mod.rs + files for non-public elements. It would make reading (and understanding) much easier. For example, there's a lot of non-pub code here at the top of the file. So when reading it I read a lot of "implementation details" first (like ValueCacheEntry, CacheEntry, etc.), w/o even getting to what is the public facing API we're adding.

Copy link
Contributor

@deuszx deuszx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished reviewing but a general comment - if I was to write a caching layer around a DB, I'd implement the same exact API for the cache and forward calls to the underlying storage. For example:

trait Storage<K, V> {
  fn read(k: K) -> Result<V, Error>;
  fn write(k: K, v: V) -> Result<(), Error>;
}

struct RocksDb { ... };
impl Storage for RocksDb { ... }

struct Cache<S: Storage> {
  cache: CacheInner,
  database: S
}

impl Storage for Cache {
  fn read(k: Key) -> Result<V, Error> {
    if let Some(cached_v) = self.cache.read(k) {
      return Ok(cached_v);
    }
    
    let db_v = self.database.read(k);
    self.cache.write(k, db_v);
    Ok(db_v)
  }
}

In this setup cache has only minimally more logic than the actual storage.

@MathieuDutSik
Copy link
Contributor Author

In this setup cache has only minimally more logic than the actual storage.

But isn't it exactly how things are done? We have the trait ReadableKeyValueStore, WritableKeyValueStore and similar, and then we have the nesting LruCachingStore<K> and the implementation.

There is a little bit of extra logic, I agree. Normally, the other PR #2637 should do the job of removing the extra logic.

@ma2bd
Copy link
Contributor

ma2bd commented Oct 28, 2024

There is a little bit of extra logic, I agree. Normally, the other PR #2637 should do the job of removing the extra logic.

indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants