Multi-tier Anti-Caching #154

apavlo · 2014-06-12T16:48:27Z

The following is instructions on how to add support for multiple storage levels in the anti-cache storage manager. Currently H-Store's EE only supports two storage tiers: a tuple is either in-memory or in the disk-resident anti-cache. The goal of this project is to make the storage management more flexible so that we can add an arbitrary storage hierarchy and let the DBMS decide where/when/how to move tuples between them.

An overview of the anti-caching architecture is available on the H-Store website.

Here is a rough outline of the steps for this project:

Modify the AntiCacheDB to be a generic API class that we can then override. The current implementation by @jdebrabant has hard-coded private functions that are to specific back-ends (e.g., NVM, BerkeleyDB). This needs to be abstracted out into different implementations (e.g., NVMAntiCacheDB) that override the base AntiCacheDB class. We should also consider adding a compressed in-memory storage layer too. Note that you have add any new C++ files to the EE's build script.
We then need a way to configure the EE at start-up to tell it what AntiCacheDB implementations that we want to use. It is better if we do this using HStoreConf parameters instead of compiler pre-processor flags so that it is easier for us to run experiments and write test cases. See these instructions on how to add new parameters. We can pass the new ones down when the Java front-end initializes the anti-caching component in ExecutionEngineJNI.antiCacheInitialize()
The next step is to extend the internal tracking meta-data for the anti-cache to allow us to determine what AntiCacheDB instance is currently storing a particular tuple. Right now all we do is set a one-bit flag in each in-memory tuple's header that tells us whether a flag is in the real table storage (i.e., PersistentTable) or whether the tuple is in the EvictedTupleTable. I propose that we keep this one-bit flag in the tuple and have a new look-up table inside of the AntiCacheEvictionManager that maps the EvictedTupleTable's BlockId to a particular AntiCacheDB instance. Then at runtime, when the EE identifies that the tuple that we're examining has this one-bit flag set to true, we grab the BlockId and get the corresponding handle to the instance that we need.
The last step of the project (and the hardest part) is extend the usage tracking component of the EE's anti-cache manager so that we can determine when to move tuples between layers. We currently maintain a LRU embedded in the tuple's header to keep track of tuples when we use them in txns. This is wasteful and will be replaced this in summer 2014. Then we are going to need a way to determine when to move something from a higher layer to layer below it (e.g., NVMAntiCacheDB -> BerkeleyAntiCacheDB). I propose that each AnticCacheDB implementation maintains its own LIFO ordering of its AntiCacheBlocks and will have a limit on the number of blocks it's allowed to store. Then when we evict data from the in-memory storage, we push it into the top-level storage. If this storage is full (i.e., the number of blocks its allowed to hold is at its limit), then that storage will spill into the next level below it. This will avoid having to keep track of timestamps or using a background thread to decide when to move data. We will want to run experiments to measure the impact of these cascading spills for a variety of workloads.

The text was updated successfully, but these errors were encountered:

apavlo added Student Project labels Jun 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-tier Anti-Caching #154

Multi-tier Anti-Caching #154

apavlo commented Jun 12, 2014

Multi-tier Anti-Caching #154

Multi-tier Anti-Caching #154

Comments

apavlo commented Jun 12, 2014