pmemids_bench is a C++ version of YCSB based benchmark suite that includes seven commonly used indexing data structures implemented in four persistent modes and four parallel modes (shown in later table). You can learn more about this project in our MSST 2020 paper.
Quick Start
Benchmark Design and Implementation
Benchmarking Results and Analysis
Usage
Contribution
Credit
- Islam, Abdullah Al Raqibul and Dai, Dong. "Understand the overheads of storage data structures on persistent memory," PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020. doi:10.1145/3332466.3374509
- Islam, Abdullah Al Raqibul and Dai, Dong and Narayanan, Anirudh and York, Christopher. "A Performance Study of Optane Persistent Memory: From Indexing Data Structures’ Perspective," MSST '20: 36th International Conference on Massive Storage Systems and Technology, 2020.
- GNU Make
- C++ (we used version 11)
- PMDK (we used version 1.8)
- external libraries: pthreads, pmemobj, vmem, memkind
To build pmemids_bench
:
$ make
Run Workload A with a dram-based array implementation in single thread:
./ycsbc -db storeds -threads 1 -dbpath /pmem -type array-dram -P workloads/workloada.spec
You can run the program without any arguments (i.e. ./ycsbc
) to see the command line argument help information.
Note that we do not have load and run commands as the original YCSB. Specify how many records to load by the recordcount property. Reference properties files in the workloads dir.
Data Structure | Description | Persistent Modes | Parallel Modes | Workloads |
---|---|---|---|---|
Arraylist | Fixed items/size | DRAM PMEM-Volatile PMEM-Persist PMEM-Trans |
Single thread Parallel, Saturated Concurrent, Contention NUMA |
A (100% Read) B (100% Write) (90% Update, 10% Insert) |
Linkedlist | Appended only | |||
Hashtable | Chained Linkedlist | |||
Skiplist | Fixed height, fair probability | |||
B-Tree | Values referenced/embedded | |||
B+-Tree | Values in the leaf nodes only | |||
RB-Tree | Instant rebalancing | |||
Table 1: Summary of indexing data structures, their different running modes under various workloads included in the benchmark suite. |
To help benchmark the storage systems, typical YCSB benchmarks include some representative workloads such as the five workloads (A to E) shown in the top part of Table 2.
However, in pmemids_bench
, this is not feasible anymore, simply because unlike benchmarking storage systems, we do not know how developers will use the indexing data structures. So, in our implementation, we leverage the YCSB workload generator, but we do create different workloads for our evaluations, as placed in the bottom part of Table 2.
There are only two workloads included in pmemids_bench
by default. But developers can configure their own workloads as we have placed several sample of them in "workloads" directory (i.e. check files with ".spec" file extension). We also reserved the original YCSB benchmarks, you can check it here.
YCSB Workload | A | B | C | D | E |
---|---|---|---|---|---|
Read Update Insert Read & Update |
50 50 - - |
95 5 - - |
100 - - - |
95 - 5 - |
50 - - 50 |
pmemids_bench Workload | A (100% Read) | B (100% Write) | |||
Read Update Insert |
100 - - |
- 90 10 |
|||
Table 2: Workloads in % of differebt operations. |
We tested our benchmark suiteon a Dell R740 rack server with two sockets. Each socket installs a 2nd generation Intel Xeon Scalable Processor (Gold 6254 @ 3.10G) with 18 physical (36 virtual) cores. The machine is running Ubuntu 18.04 with a Linux kernel version 4.15.0. To enable the NUMA evaluation, we put all the DRAM and Optane DC DIMMS into one socket (node 0). This socket has 6 DRAM DIMMS with 32GB each and 1 Optane DC DIMM with 128GB. In all the evaluations, except the NUMA ones, we bind all the threads to node 0 to enable local memory accesses. The total memory capacity is 192GB DRAM and 128GB Optane DC. We used PMDK 1.8 in the implementation.
Fig. 3. The benchmark results of seven indexing data structures in a single thread case. The x-axis lists all the persistent modes and y-axis shows the throughput (KTPS or k transactions per second) after running the workloads. We tested all five workloads, and showed two of them (100% read and 100% write) here. Note that, all the experiments were repeated 10 times. We plot both the average and the variations in the figure. Fig. 5. The performance of two B-Tree implementations (value referenced vs. value embedded) in different persistent modes. The tree node size changes from 256 byts to 2K btyes. Fig. 6. The benchmark results of seven indexing data structures in the Parallel, Saturated mode. The x-axis is the thread number; the y-axis is the aggregated throughput (KTPS) across all threads. We used up to 16 threads to make sure each thread is running in a physical core. We show both 100% write and 100% read workloads here.One of the key motivations of this project is to allow others to reuse our framework and implement their specific data structures for performance comparison (using YCSB workload). Please have a look at our wiki page to learn how to add new data structures to this project. From the Workloads section, you will get some idea about workload generation.
The project is well suited for testing third-party libraries. We helped TwoMisses (an updated version of ThreeMisses
, the winner of the first SNIA NVDIMM Challenge!) to benchmark their library. The integration is done in the branch threemiss.
If you would like to contribute to pmemids_bench
, we would certainly appreciate your help! Here are some of the ways you can contribute:
- Bug fixes, whether they be for performance or correctness of the existing data structure implementation
- Improvement of the existing benchmark implementation and documentation
- Support for additional data structure implementations; we are open to adding more data structures in this repo
Our future goal is to provide a set of portable, high-performance PMEM-aware data structure baselines. For code contributions, please focus on code simplicity and readability. If you open a pull request, we will do an immediate sanity check and respond to you as soon as possible.
- We use Jinglei Ren's (Github handle: basicthinker) C++ implementation of YCSB benchmark YCSB-C as the base of our project.