Skip to content

Commit

Permalink
Use hashtable as the default type of temp set object during sunion/sd…
Browse files Browse the repository at this point in the history
…iff (valkey-io#996)

This patch try to set the temp set object as default hash table type.
And did a simple predication of the temp set object encoding when
initialize `dstset` to reduce the unnecessary conversation.

## Issue Description

According to existing code logic, when did operation like `sunion` and
`sdiff` , the temp set object could be `intset`, `listpack` and
`hashtable`, for the `listpack`, the efficiency is low when did
operation like `find` and `compare` , need to traverse all elements.
When we exploring the hotspots, found the `lpFind` and `memcmp` has been
the bottleneck when running workloads like below:

-
[memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml)
-
[memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml)


![image](https://github.com/user-attachments/assets/71dfc70b-2ad5-4832-a338-712deefca20e)

## Optimization 

This patch try to set the temp set object as default hash table type.
And did a simple predication of the temp set object encoding when
initialize `dstset` to reduce the unnecessary conversation.

### Test Environment

- OPERATING SYSTEM: Ubuntu 22.04.4 LTS
- Kernel: 5.15.0-116-generic
- PROCESSOR: Intel Xeon Platinum 8380
- Server and Client in same socket.

#### Server Configuration
```
taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey.conf

port 9001
bind * -::*
daemonize no
protected-mode no
save ""
```

#### Performance Boost 

| Test Name| Perf Boost|
|-|-|

|[memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml)
|41%|

|[memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml)
|27%|


### More Tests
With above test set which have total 110 elements in the 2 given sets.
We also did some benchmarking by adjusting the total number of elements
in all given sets. We can still observe the performance boost.


![image](https://github.com/user-attachments/assets/b2ab420c-43e5-45de-9715-7d943df229cb)

---------

Signed-off-by: Lipeng Zhu <[email protected]>
Co-authored-by: Wangyang Guo <[email protected]>
  • Loading branch information
lipzhu and guowangy authored Sep 10, 2024
1 parent 9f0c801 commit 58fe9c0
Showing 1 changed file with 23 additions and 1 deletion.
24 changes: 23 additions & 1 deletion src/t_set.c
Original file line number Diff line number Diff line change
Expand Up @@ -1445,6 +1445,7 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke
robj **sets = zmalloc(sizeof(robj *) * setnum);
setTypeIterator *si;
robj *dstset = NULL;
int dstset_encoding = OBJ_ENCODING_INTSET;
char *str;
size_t len;
int64_t llval;
Expand All @@ -1463,6 +1464,23 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke
zfree(sets);
return;
}
/* For a SET's encoding, according to the factory method setTypeCreate(), currently have 3 types:
* 1. OBJ_ENCODING_INTSET
* 2. OBJ_ENCODING_LISTPACK
* 3. OBJ_ENCODING_HT
* 'dstset_encoding' is used to determine which kind of encoding to use when initialize 'dstset'.
*
* If all sets are all OBJ_ENCODING_INTSET encoding or 'dstkey' is not null, keep 'dstset'
* OBJ_ENCODING_INTSET encoding when initialize. Otherwise it is not efficient to create the 'dstset'
* from intset and then convert to listpack or hashtable.
*
* If one of the set is OBJ_ENCODING_LISTPACK, let's set 'dstset' to hashtable default encoding,
* the hashtable is more efficient when find and compare than the listpack. The corresponding
* time complexity are O(1) vs O(n). */
if (!dstkey && dstset_encoding == OBJ_ENCODING_INTSET &&
(setobj->encoding == OBJ_ENCODING_LISTPACK || setobj->encoding == OBJ_ENCODING_HT)) {
dstset_encoding = OBJ_ENCODING_HT;
}
sets[j] = setobj;
if (j > 0 && sets[0] == sets[j]) {
sameset = 1;
Expand Down Expand Up @@ -1504,7 +1522,11 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke
/* We need a temp set object to store our union/diff. If the dstkey
* is not NULL (that is, we are inside an SUNIONSTORE/SDIFFSTORE operation) then
* this set object will be the resulting object to set into the target key*/
dstset = createIntsetObject();
if (dstset_encoding == OBJ_ENCODING_INTSET) {
dstset = createIntsetObject();
} else {
dstset = createSetObject();
}

if (op == SET_OP_UNION) {
/* Union is trivial, just add every element of every set to the
Expand Down

0 comments on commit 58fe9c0

Please sign in to comment.