Improvements to the LRU cache #2500

MathieuDutSik · 2024-09-16T11:47:53Z

Motivation

The LRU is currently caching only the read_value operations. This leaves some optimizations away.

Proposal

The following was done:

The map was renamed as map_value.
A map for find is introduced keeping track of the find_keys_by_prefix and find_key_values_by_prefix.
For the find_keys_by_prefix we keep the map as suffix closed and we use the existing entries to get some value. So, if we already computed for prefix [0] then we can deduce the result for [0,2].
Similarly for find_key_values_by_prefix.
Write batch tries to update as much as possible the cache.
The max_cache_size which before was a maximum on the number of entries becomes a maximum on the number of bytes in the cache (and is expanded from 1000 to 10000000).
The maximum number of entries in the cache is set to 1000.

Measurement setup:

Done with the test_wasm_end_to_end_amm on remote-net.
Storage used is ScyllaDb which has LRU caching.
Code compiled in --release which takes about 1H to compile.
Done 10 times with the 1st entry discarded.
Scripting available at https://github.com/MathieuDutSik/linera_parsing_tools

Averaged results:

Cache hit. Old code 82.9%, new code: 84.8% for values (contains_key and read_value), 91% for finds (find_keys_by_prefix and find_key_values_by_prefix).
read_value_bytes. 0.0276ms (Old code) vs 0.019 ms (New code).
contains_key: 0.25 ms (Old code) vs 0.076 ms (New code)
find_keys_by_prefix: 1.17 ms (Old code) vs 0.087 ms (New code)
find_key_values_by_prefix: 1.08 ms (Old code) vs 0.13 ms (New code)
write_batch: 2.22 ms (Old code) vs 2.52 ms (New code)
total runtime: 16.63 second (Old code) vs 15.83 seconds (New code)

Test Plan

The CI.

Release Plan

Not relevant.

Links

reviewer checklist

linera-views/src/common.rs

afck · 2024-09-16T13:12:58Z

linera-views/src/backends/lru_caching.rs

+    map_find_key_values: BTreeMap<Vec<u8>, Vec<(Vec<u8>, Vec<u8>)>>,
+    map_find_keys: BTreeMap<Vec<u8>, Vec<Vec<u8>>>,


I'm not sure it's a good idea to cache these, especially as a single entries, since they can contain any number of keys or key-value pairs, so their size is unbounded.

We could not cache the find_ results at all.

Or we could only store the fact that find_keys or find_key_values was called with a particular prefix at all, and then cache the actual entries in the other map(s). Whenever an entry belonging to a prefix falls out of the LRU cache or gets deleted, that prefix would have to be cleared here, too.

The size argument applies just as well to the read_valueLRU cache. Sure, the DynamoDb has some limit like 400K, but with the value splitting that limit is passed over.

I added a check so that if the cache entry being inserted is larger than the maximal size, then it is refused since that would lead to total clearing of the cache.

@MathieuDutSik @afck Let's keep the discussion going and make sure that we go for the best solution here.

In my view the current approach is still not ideal, because:

It potentially duplicates entries, if the same value occurs in both a "find" and a "value" result.

It creates weird semantics for moving "find" results to the front of a queue based on a "value" cache hit.

I think it would be better to do something like this:

Whenever we handle a "find" query we put all found entries into the cache as individual "value", and in addition make a note that the cache now contains everything in that range.

Whenever an entry falls out of the cache we remove (or even better: split or shrink) any such range entry, because now the cache doesn't contain everything under it anymore.

If we get a "find" query for a range contained in an up-to-date one, we can bump all entries in the smaller range to the front of the queue, i.e. only those that we actually return.

linera-views/src/backends/lru_caching.rs

afck · 2024-09-16T13:19:47Z

linera-views/src/backends/lru_caching.rs

+        if let Some(key_prefix) = lower_bound {
+            self.map_find_keys.remove(&key_prefix);
+            let full_prefix = (key_prefix, CacheEntry::FindKeys);
+            self.queue.remove(&full_prefix);


Why do we have to remove this entry? If we read a value in that range, it doesn't make the result invalid, does it?

This has been changed.

The thing is that it was an update to the LRU cache after the write to the cache.

linera-views/src/backends/lru_caching.rs

ma2bd

Thanks for looking into this but the test plan should include empirical evidence that this PR makes things faster, not slower. Also how we know this is a low-hanging fruit?

MathieuDutSik · 2024-09-17T13:16:20Z

Thanks for looking into this but the test plan should include empirical evidence that this PR makes things faster, not slower. Also how we know this is a low-hanging fruit?

Data will be provided about the PR. I think it is a low-hanging fruit because contains_key operations were slower than read-value operation in some contexts.

linera-client/src/client_options.rs

linera-indexer/lib/src/rocks_db.rs

linera-indexer/lib/src/scylla_db.rs

linera-service/src/proxy.rs

linera-service/src/server.rs

linera-views/src/backends/lru_caching.rs

afck · 2024-09-23T13:46:18Z

linera-views/src/backends/lru_caching.rs

+                self.remove_from_queue(&full_prefix);
+            }
+            btree_map::Entry::Vacant(entry) => {
+                entry.insert(cache_entry);
            }


If I'm reading this correctly, we are not removing any map_value entries that are redundant now? So those would be duplicated in memory.

On the other hand, if we do delete them, their lifetime in the cache is now tied to the lifetime of the whole collection.

This is why I'm not convinced by the whole idea of having entire find_key_values results as a single entry in the LRU cache.

linera-views/src/backends/lru_caching.rs

afck · 2024-09-23T13:58:58Z

linera-views/src/backends/lru_caching.rs

+            return cache_entry.get_read_value(key_red);
+        }
+        None
+    }


Shouldn't this bump the entry back to the front of the queue?

Yes, and it has been corrected.

ndr-ds · 2024-09-23T19:13:41Z

linera-views/src/backends/lru_caching.rs

+        if cache_size > self.lru_cache_policy.max_cache_size {
+            // Inserting that entry would lead to complete clearing of the cache
+            // which is counter productive
+            return;
+        }


Isn't it common practice to in general not cache objects that are too big? IMHO I think that no cached item should be bigger than probably like 10% (being generous) of the total size of the cache. I don't think only checking for objects that are bigger than the full size of the cache is enough.
If the objects happen to be too big for the size of the cache (even if they're smaller than the max cache size), in the worse case the cache will be useless, as every new insertion will basically evict the entire existing cache.
And do we know how big what we're caching here will generally be? It might be worth having a metric for it, so we can adjust the size of the cache accordingly if needed.

There are a number of heuristics about that. As I said, there is a huge literature on caching.
I could introduce the 10%, but the problem is that this becomes a parameter to add to the function call.

Adding the metrics has no issue though.

Why will it become a parameter do add to the function call? Can't you just add like a max_entry_size_pct or something (with a comment explaining what it is better) to the cache policy?
Also, even if it meant adding a parameter, I think it's better to add a parameter than to have a cache that might not add any value in the worse case

I added a control value max_entry_size in the code.

deuszx · 2024-10-23T13:19:56Z

linera-service/src/server.rs

-        /// The maximal number of entries in the storage cache.
-        #[arg(long, default_value = "1000")]
-        cache_size: usize,
+        /// The storage cache policy


This will generate a CLI w/o any helpful tips about what values are acceptable here. We should:

Improve the doc

and/or add a newtype wrapper for the storage_cache_policy with useful defaults.

deuszx · 2024-10-23T13:39:09Z

linera-views/src/backends/lru_caching.rs

+/// If the number of entries in the cache is too large then the underlying maps
+/// become the limiting factor
+pub const DEFAULT_STORAGE_CACHE_POLICY: StorageCachePolicy = StorageCachePolicy {
+    max_cache_size: 10000000,


Why not a power of 2? I'd expect this to be something like 5MB for example.

deuszx · 2024-10-23T13:43:40Z

linera-views/src/backends/lru_caching.rs

+pub const DEFAULT_STORAGE_CACHE_POLICY: StorageCachePolicy = StorageCachePolicy {
+    max_cache_size: 10000000,
+    max_entry_size: 1000000,
+    max_cache_entries: 1000,


This is kindof a redundant with max_cache_size, isn't it? i.e. both limit the size of a cache but in different ways. I think it's possible to set them in a "incompatible" way. For example, with thisDEFAULT, we have 1 entry having (approximately) ~10Kb. Which is probably a lot 🤔

We need several ways to control the cache. They are not all independent of course.
In the chosen settings, we have at most 1M for an individual entry, at most 10M for the full cache and at most 1000 entries in the cache. Of course, if we have N entries and each entry has at most M bytes then the total size is going to be at most N*M.

This is needed. If we have too many entries in the cache then we have a the risk of a cache containing 1 million entries each of a few bytes. That leads to a BTreeMap with 1 million entries and the runtime can become large. We do not want to be in that situation.

We also do not want to be in a situation where the memory expense of the cache is too high. And limiting the maximum size of a cache entry allows us to avoid one big entry crowding out all others.

deuszx · 2024-10-23T13:48:41Z

linera-views/src/backends/lru_caching.rs

+pub fn read_storage_cache_policy(storage_cache_policy: Option<String>) -> StorageCachePolicy {
+    match storage_cache_policy {
+        None => DEFAULT_STORAGE_CACHE_POLICY,
+        Some(storage_cache_policy) => {


Why not allow for configuring the caching policy by simply expecting --max-cache-size <bytes> , --max-cache-entries <number> , etc to the binary? Reading file with these three written down as a JSON seems like an unnecessary complication 🤔

if you decide to promote them to binary arguments, you can even decide to make max_cache_size and max_cache_entries mutually exclusive.

We generally want to switch to the use of too many command line entries and switch to the use of input in files.

deuszx · 2024-10-23T13:50:46Z

linera-views/src/backends/lru_caching.rs

+        ]),
+    )
+    .expect("Histogram can be created")
+});


There are so many things here tagged with cfg(with_metrics) macro. Maybe extract them to a file and put the cfg(with_metrics) on the whole module?

Yes, that is annoying.
The solution I found was to use the same strategy as in linera-core/src/cloient/mod.rs of putting the contents in a "mod metrics" submodule.

linera-views/src/backends/lru_caching.rs

deuszx · 2024-10-23T13:59:10Z

linera-views/src/backends/lru_caching.rs

@@ -1,45 +1,305 @@
 // Copyright (c) Zefchain Labs, Inc.
 // SPDX-License-Identifier: Apache-2.0

-//! Add LRU (least recently used) caching to a given store.
+//! Add caching to a given store.


As for code organisation – I'd split this code into mod.rs + files for non-public elements. It would make reading (and understanding) much easier. For example, there's a lot of non-pub code here at the top of the file. So when reading it I read a lot of "implementation details" first (like ValueCacheEntry, CacheEntry, etc.), w/o even getting to what is the public facing API we're adding.

deuszx

I haven't finished reviewing but a general comment - if I was to write a caching layer around a DB, I'd implement the same exact API for the cache and forward calls to the underlying storage. For example:

trait Storage<K, V> {
  fn read(k: K) -> Result<V, Error>;
  fn write(k: K, v: V) -> Result<(), Error>;
}

struct RocksDb { ... };
impl Storage for RocksDb { ... }

struct Cache<S: Storage> {
  cache: CacheInner,
  database: S
}

impl Storage for Cache {
  fn read(k: Key) -> Result<V, Error> {
    if let Some(cached_v) = self.cache.read(k) {
      return Ok(cached_v);
    }
    
    let db_v = self.database.read(k);
    self.cache.write(k, db_v);
    Ok(db_v)
  }
}

In this setup cache has only minimally more logic than the actual storage.

MathieuDutSik · 2024-10-23T18:31:41Z

In this setup cache has only minimally more logic than the actual storage.

But isn't it exactly how things are done? We have the trait ReadableKeyValueStore, WritableKeyValueStore and similar, and then we have the nesting LruCachingStore<K> and the implementation.

There is a little bit of extra logic, I agree. Normally, the other PR #2637 should do the job of removing the extra logic.

ma2bd · 2024-10-28T01:17:22Z

There is a little bit of extra logic, I agree. Normally, the other PR #2637 should do the job of removing the extra logic.

indeed

MathieuDutSik marked this pull request as ready for review September 16, 2024 12:50

MathieuDutSik requested review from Twey, ma2bd and afck September 16, 2024 12:50

afck reviewed Sep 16, 2024

View reviewed changes

ma2bd requested changes Sep 17, 2024

View reviewed changes

MathieuDutSik force-pushed the lru_improvement branch from d52190b to 1ce99b6 Compare September 17, 2024 19:44

afck reviewed Sep 23, 2024

View reviewed changes

ma2bd self-requested a review September 23, 2024 14:20

ndr-ds reviewed Sep 23, 2024

View reviewed changes

MathieuDutSik force-pushed the lru_improvement branch from 5ad5bc2 to 109e666 Compare September 24, 2024 06:51

MathieuDutSik force-pushed the lru_improvement branch 2 times, most recently from 8df2a72 to 09b5155 Compare October 3, 2024 07:15

MathieuDutSik force-pushed the lru_improvement branch 3 times, most recently from b6d9c7b to 02b0aff Compare October 23, 2024 12:54

MathieuDutSik requested a review from deuszx October 23, 2024 13:05

deuszx reviewed Oct 23, 2024

View reviewed changes

linera-views/src/backends/lru_caching.rs Show resolved Hide resolved

deuszx reviewed Oct 23, 2024

View reviewed changes

MathieuDutSik force-pushed the lru_improvement branch from c3b7f8b to 0b96d06 Compare October 27, 2024 12:22

MathieuDutSik force-pushed the lru_improvement branch from 0601333 to 8272124 Compare November 17, 2024 10:35

Extend the cache support.

f74f349

MathieuDutSik force-pushed the lru_improvement branch from 8272124 to f74f349 Compare November 17, 2024 11:42

MathieuDutSik added 7 commits November 17, 2024 13:07

A bunch of corrections for the LRU.

61e5eba

Corrections for ScyllaDb / DynamoDb.

1adc94a

Correct RocksDb.

69e2f72

Cleanup the storage-service.

42e1a4f

Now everything seems to compile.

c869008

Reformatting.

6a87127

Some update to the storage_cache_policy.

f839d51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to the LRU cache #2500

Improvements to the LRU cache #2500

MathieuDutSik commented Sep 16, 2024 •

edited

Loading

afck Sep 16, 2024

MathieuDutSik Sep 16, 2024

MathieuDutSik Sep 22, 2024

ma2bd Sep 24, 2024 •

edited

Loading

afck Sep 24, 2024

afck Sep 16, 2024

MathieuDutSik Sep 22, 2024

ma2bd left a comment •

edited

Loading

MathieuDutSik commented Sep 17, 2024

afck Sep 23, 2024

afck Sep 23, 2024

MathieuDutSik Sep 24, 2024

ndr-ds Sep 23, 2024

MathieuDutSik Sep 24, 2024

ndr-ds Sep 24, 2024

MathieuDutSik Oct 29, 2024

deuszx Oct 23, 2024

deuszx Oct 23, 2024

deuszx Oct 23, 2024 •

edited

Loading

MathieuDutSik Oct 29, 2024 •

edited

Loading

deuszx Oct 23, 2024

MathieuDutSik Oct 29, 2024

deuszx Oct 23, 2024

MathieuDutSik Oct 29, 2024

deuszx Oct 23, 2024 •

edited

Loading

deuszx left a comment •

edited

Loading

MathieuDutSik commented Oct 23, 2024

ma2bd commented Oct 28, 2024

		map_find_key_values: BTreeMap<Vec<u8>, Vec<(Vec<u8>, Vec<u8>)>>,
		map_find_keys: BTreeMap<Vec<u8>, Vec<Vec<u8>>>,

Improvements to the LRU cache #2500

Are you sure you want to change the base?

Improvements to the LRU cache #2500

Conversation

MathieuDutSik commented Sep 16, 2024 • edited Loading

Motivation

Proposal

Test Plan

Release Plan

Links

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ma2bd Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ma2bd left a comment • edited Loading

Choose a reason for hiding this comment

MathieuDutSik commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deuszx Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

MathieuDutSik Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deuszx Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

deuszx left a comment • edited Loading

Choose a reason for hiding this comment

MathieuDutSik commented Oct 23, 2024

ma2bd commented Oct 28, 2024

MathieuDutSik commented Sep 16, 2024 •

edited

Loading

ma2bd Sep 24, 2024 •

edited

Loading

ma2bd left a comment •

edited

Loading

deuszx Oct 23, 2024 •

edited

Loading

MathieuDutSik Oct 29, 2024 •

edited

Loading

deuszx Oct 23, 2024 •

edited

Loading

deuszx left a comment •

edited

Loading