Skip to content

Commit 3686500

Browse files
alwayslove2013XuanYang-cn
authored andcommitted
update some docs
Signed-off-by: min.tian <[email protected]>
1 parent ef7055b commit 3686500

File tree

18 files changed

+96
-32
lines changed

18 files changed

+96
-32
lines changed

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,17 @@
33
[![version](https://img.shields.io/pypi/v/vectordb-bench.svg?color=blue)](https://pypi.org/project/vectordb-bench/)
44
[![Downloads](https://pepy.tech/badge/vectordb-bench)](https://pepy.tech/project/vectordb-bench)
55

6-
## What is VectorDBBench
7-
VectorDBBench(VDBBench) is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
6+
## What is VDBBench
7+
VDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
88

99
Understanding the importance of user experience, we provide an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly.
1010
To add more relevance and practicality, we provide cost-effectiveness reports particularly for cloud services. This allows for a more realistic and applicable benchmarking process.
1111

1212
Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as [SIFT](http://corpus-texmex.irisa.fr/), [GIST](http://corpus-texmex.irisa.fr/), [Cohere](https://huggingface.co/datasets/Cohere/wikipedia-22-12/tree/main/en), and a dataset generated by OpenAI from an opensource [raw dataset](https://huggingface.co/datasets/allenai/c4). It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!
1313

14-
Prepare to delve into the world of VectorDBBench, and let it guide you in uncovering your perfect vector database match.
14+
Prepare to delve into the world of VDBBench, and let it guide you in uncovering your perfect vector database match.
1515

16-
VectorDBBench is sponsered by Zilliz,the leading opensource vectorDB company behind Milvus. Choose smarter with VectorDBBench- start your free test on [zilliz cloud](https://zilliz.com/) today!
16+
VDBBench is sponsered by Zilliz,the leading opensource vectorDB company behind Milvus. Choose smarter with VDBBench - start your free test on [zilliz cloud](https://zilliz.com/) today!
1717

1818
**Leaderboard:** https://zilliz.com/benchmark
1919
## Quick Start
@@ -420,7 +420,7 @@ make format
420420
## How does it work?
421421
### Result Page
422422
![image](https://github.com/zilliztech/VectorDBBench/assets/105927039/8a981327-c1c6-4796-8a85-c86154cb5472)
423-
This is the main page of VectorDBBench, which displays the standard benchmark results we provide. Additionally, results of all tests performed by users themselves will also be shown here. We also offer the ability to select and compare results from multiple tests simultaneously.
423+
This is the main page of VDBBench, which displays the standard benchmark results we provide. Additionally, results of all tests performed by users themselves will also be shown here. We also offer the ability to select and compare results from multiple tests simultaneously.
424424

425425
The standard benchmark results displayed here include all 15 cases that we currently support for 6 of our clients (Milvus, Zilliz Cloud, Elastic Search, Qdrant Cloud, Weaviate Cloud and PgVector). However, as some systems may not be able to complete all the tests successfully due to issues like Out of Memory (OOM) or timeouts, not all clients are included in every case.
426426

@@ -454,7 +454,7 @@ We've developed lots of comprehensive benchmark cases to test vector databases'
454454
- **Int-Filter Cases:** Evaluates search performance with int-based filter expression (e.g. "id >= 2,000").
455455
- **Label-Filter Cases:** Evaluates search performance with label-based filter expressions (e.g., "color == 'red'"). The test includes randomly generated labels to simulate real-world filtering scenarios.
456456
#### Streaming Cases
457-
- **Insertion-Under-Load Case:** Evaluates search performance while maintaining a constant insertion workload. VectorDBBench applies a steady stream of insert requests at a fixed rate to simulate real-world scenarios where search operations must perform reliably under continuous data ingestion.
457+
- **Insertion-Under-Load Case:** Evaluates search performance while maintaining a constant insertion workload. VDBBench applies a steady stream of insert requests at a fixed rate to simulate real-world scenarios where search operations must perform reliably under continuous data ingestion.
458458

459459
Each case provides an in-depth examination of a vector database's abilities, providing you a comprehensive view of the database's performance.
460460

@@ -480,15 +480,15 @@ We have strict requirements for the data set format, please follow them.
480480

481481
- `Train File Count` - If the vector file is too large, you can consider splitting it into multiple files. The naming format for the split files should be `train-[index]-of-[file_count].parquet`. For example, `train-01-of-10.parquet` represents the second file (0-indexed) among 10 split files.
482482

483-
- `Use Shuffled Data` - If you check this option, the vector data files need to be modified. VectorDBBench will load the data labeled with `shuffle`. For example, use `shuffle_train.parquet` instead of `train.parquet` and `shuffle_train-04-of-10.parquet` instead of `train-04-of-10.parquet`. The `id` column in the shuffled data can be in any order.
483+
- `Use Shuffled Data` - If you check this option, the vector data files need to be modified. VDBBench will load the data labeled with `shuffle`. For example, use `shuffle_train.parquet` instead of `train.parquet` and `shuffle_train-04-of-10.parquet` instead of `train-04-of-10.parquet`. The `id` column in the shuffled data can be in any order.
484484

485485

486486
## Goals
487487
Our goals of this benchmark are:
488488
### Reproducibility & Usability
489-
One of the primary goals of VectorDBBench is to enable users to reproduce benchmark results swiftly and easily, or to test their customized scenarios. We believe that lowering the barriers to entry for conducting these tests will enhance the community's understanding and improvement of vector databases. We aim to create an environment where any user, regardless of their technical expertise, can quickly set up and run benchmarks, and view and analyze results in an intuitive manner.
489+
One of the primary goals of VDBBench is to enable users to reproduce benchmark results swiftly and easily, or to test their customized scenarios. We believe that lowering the barriers to entry for conducting these tests will enhance the community's understanding and improvement of vector databases. We aim to create an environment where any user, regardless of their technical expertise, can quickly set up and run benchmarks, and view and analyze results in an intuitive manner.
490490
### Representability & Realism
491-
VectorDBBench aims to provide a more comprehensive, multi-faceted testing environment that accurately represents the complexity of vector databases. By moving beyond a simple speed test for algorithms, we hope to contribute to a better understanding of vector databases in real-world scenarios. By incorporating as many complex scenarios as possible, including a variety of test cases and datasets, we aim to reflect realistic conditions and offer tangible significance to our community. Our goal is to deliver benchmarking results that can drive tangible improvements in the development and usage of vector databases.
491+
VDBBench aims to provide a more comprehensive, multi-faceted testing environment that accurately represents the complexity of vector databases. By moving beyond a simple speed test for algorithms, we hope to contribute to a better understanding of vector databases in real-world scenarios. By incorporating as many complex scenarios as possible, including a variety of test cases and datasets, we aim to reflect realistic conditions and offer tangible significance to our community. Our goal is to deliver benchmarking results that can drive tangible improvements in the development and usage of vector databases.
492492

493493
## Contribution
494494
### General Guidelines

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,3 +220,9 @@ builtins-ignorelist = [
220220
# "dict", # TODO
221221
# "filter",
222222
]
223+
224+
[tool.ruff.lint.per-file-ignores]
225+
"vectordb_bench/backend/clients/*" = ["PLC0415"]
226+
"vectordb_bench/cli/batch_cli.py" = ["PLC0415"]
227+
"vectordb_bench/backend/data_source.py" = ["PLC0415"]
228+

vectordb_bench/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ def run_streamlit():
1717
cmd = [
1818
"streamlit",
1919
"run",
20-
f"{pathlib.Path(__file__).parent}/frontend/vdb_benchmark.py",
20+
f"{pathlib.Path(__file__).parent}/frontend/vdbbench.py",
2121
"--logger.level",
2222
"info",
2323
"--theme.base",

vectordb_bench/backend/clients/api.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ class VectorDB(ABC):
132132
"""
133133

134134
"The filtering types supported by the VectorDB Client, default only non-filter"
135-
supported_filter_types: list[FilterOp] = [FilterOp.NonFilter, FilterOp.NumGE]
135+
supported_filter_types: list[FilterOp] = [FilterOp.NonFilter]
136136

137137
@classmethod
138138
def filter_supported(cls, filters: Filter) -> bool:

vectordb_bench/backend/clients/aws_opensearch/config.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,20 @@ def __eq__(self, obj: any):
7575
and self.quantization_type == obj.quantization_type
7676
)
7777

78+
def __hash__(self) -> int:
79+
return hash(
80+
(
81+
self.engine,
82+
self.M,
83+
self.efConstruction,
84+
self.number_of_shards,
85+
self.number_of_replicas,
86+
self.number_of_segments,
87+
self.use_routing,
88+
self.quantization_type,
89+
)
90+
)
91+
7892
def parse_metric(self) -> str:
7993
log.info(f"User specified metric_type: {self.metric_type_name}")
8094
self.metric_type = MetricType[self.metric_type_name.upper()]

vectordb_bench/backend/clients/elastic_cloud/config.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,18 @@ def __eq__(self, obj: any):
4848
and self.M == obj.M
4949
)
5050

51+
def __hash__(self) -> int:
52+
return hash(
53+
(
54+
self.index,
55+
self.number_of_shards,
56+
self.number_of_replicas,
57+
self.use_routing,
58+
self.efConstruction,
59+
self.M,
60+
)
61+
)
62+
5163
def parse_metric(self) -> str:
5264
if self.metric_type == MetricType.L2:
5365
return "l2_norm"

vectordb_bench/backend/clients/milvus/milvus.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def __init__(
2929
dim: int,
3030
db_config: dict,
3131
db_case_config: MilvusIndexConfig,
32-
collection_name: str = "VectorDBBenchCollection",
32+
collection_name: str = "VDBBench",
3333
drop_old: bool = False,
3434
name: str = "Milvus",
3535
with_scalar_labels: bool = False,

vectordb_bench/backend/clients/qdrant_cloud/config.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,20 @@ def __eq__(self, obj: any):
6363
and self.default_segment_number == obj.default_segment_number
6464
)
6565

66+
def __hash__(self) -> int:
67+
return hash(
68+
(
69+
self.m,
70+
self.payload_m,
71+
self.create_payload_int_index,
72+
self.create_payload_keyword_index,
73+
self.is_tenant,
74+
self.use_scalar_quant,
75+
self.sq_quantile,
76+
self.default_segment_number,
77+
)
78+
)
79+
6680
def parse_metric(self) -> str:
6781
if self.metric_type == MetricType.L2:
6882
return "Euclid"

vectordb_bench/backend/clients/zilliz_cloud/zilliz_cloud.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ def __init__(
1010
dim: int,
1111
db_config: dict,
1212
db_case_config: DBCaseConfig,
13-
collection_name: str = "ZillizCloudVectorDBBench",
13+
collection_name: str = "ZillizCloudVDBBench",
1414
drop_old: bool = False,
1515
name: str = "ZillizCloud",
1616
**kwargs,

vectordb_bench/backend/dataset.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,9 @@ def __eq__(self, obj: any):
242242
return self.data.name == obj.data.name and self.data.label == obj.data.label
243243
return False
244244

245+
def __hash__(self) -> int:
246+
return hash((self.data.name, self.data.label))
247+
245248
def set_reader(self, reader: DatasetReader):
246249
self.reader = reader
247250

0 commit comments

Comments
 (0)