[Feature]: Clustering optimization #28410

wayblink · 2023-11-14T03:31:47Z

Is there an existing issue for this?

I have searched the existing issues

Is your feature request related to a problem? Please describe.

Umbrella issue for clustering key optimization for milvus.

In the realm of database management, maximizing the efficiency of data storage and retrieval is of utmost importance. A clustering key stands out as a crucial element in database design, guiding the physical storage arrangement based on the distribution of data within a table. In conventional database systems, the usual data distribution revolves around the minimum and maximum values of scalar fields. However, in the case of a vector database, vectors take precedence as our primary entities. Consequently, in Milvus, we're committed to supporting both scalar clustering keys and vector clustering keys.

Key change:
1, Support designating a scalar or vector field as the clustering key for a collection.
2, Enabling bulk insert data with specific clustering information. Milvus will organize the data based on the provided clustering information.
3, Filtering out irrelevant data during searches based on clustering information.
4, Implementing a feature in Milvus to compact collections with a clustering key, leading to a rearrangement of storage.

Phase 1: Support bulk insert and query data with clustering info

Tasks:

Phase 2: Clustering based compaction

Dependency:

L0 delete/compaction
Milvus-storage V2 integration
Compaction V2 refactoring (weak dependency)

Tasks:

Clustering compaction strategy
Clustering compaction schedule
Clustering compaction execution
E2E test

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan · 2023-11-14T08:21:46Z

/assign @wayblink

#28410 /kind feature Signed-off-by: wayblink <[email protected]>

#28410 Signed-off-by: wayblink <[email protected]>

xiaocai2333 · 2024-12-04T08:29:25Z

/assign

…imits of the DataNode (#38210) issue: #28410 master pr: #38209 --------- Signed-off-by: Cai Zhang <[email protected]>

… of the DataNode (#38209) issue: #28410 --------- Signed-off-by: Cai Zhang <[email protected]>

issue: #28410 master pr: #38417 Signed-off-by: Cai Zhang <[email protected]>

issue: #28410 Signed-off-by: Cai Zhang <[email protected]>

wayblink added the kind/feature Issues related to feature request from users label Nov 14, 2023

wayblink assigned xiaofan-luan Nov 14, 2023

sre-ci-robot assigned wayblink Nov 14, 2023

chasingegg mentioned this issue Nov 22, 2023

[Enhancement]: Add efficient distance computations in Go #28656

Closed

1 task

This was referenced Dec 18, 2023

Define clustering info milvus-io/milvus-proto#227

Merged

feat: Clustering optimization part 1 #28769

Closed

feat: Support bulkinsert clustering info and search optimizing #29444

Closed

This was referenced Dec 29, 2023

Add is_clustering_key in fieldschema milvus-io/milvus-proto#235

Merged

feat: add clustering key in create/describe collection #29506

Merged

enhance: Add L2 segment level #29595

Merged

sre-ci-robot pushed a commit that referenced this issue Jan 7, 2024

feat: add clustering key in create/describe collection (#29506)

635a7f7

#28410 /kind feature Signed-off-by: wayblink <[email protected]>

sre-ci-robot pushed a commit that referenced this issue Feb 18, 2024

enhance: Add L2 segment level (#29595)

2bc212c

#28410 Signed-off-by: wayblink <[email protected]>

chasingegg mentioned this issue Mar 13, 2024

[Bug]: Code error #31243

Closed

1 task

sre-ci-robot assigned xiaocai2333 Dec 4, 2024

This was referenced Dec 4, 2024

enhance: Determine the number of buffers based on the resource limits of the DataNode #38209

Merged

enhance: [2.4]Determine the number of buffers based on the resource limits of the DataNode #38210

Merged

sre-ci-robot pushed a commit that referenced this issue Dec 8, 2024

enhance: [2.4]Determine the number of buffers based on the resource l…

ddc40a7

…imits of the DataNode (#38210) issue: #28410 master pr: #38209 --------- Signed-off-by: Cai Zhang <[email protected]>

sre-ci-robot pushed a commit that referenced this issue Dec 8, 2024

enhance: Determine the number of buffers based on the resource limits…

41b19c6

… of the DataNode (#38209) issue: #28410 --------- Signed-off-by: Cai Zhang <[email protected]>

This was referenced Dec 12, 2024

fix: Fix sorting buffer in clustering compaction #38417

Merged

fix: [2.4]Fix sorting buffer in clustering compaction #38418

Merged

sre-ci-robot pushed a commit that referenced this issue Dec 13, 2024

fix: [2.4]Fix sorting buffer in clustering compaction (#38418)

85ade98

issue: #28410 master pr: #38417 Signed-off-by: Cai Zhang <[email protected]>

sre-ci-robot pushed a commit that referenced this issue Dec 13, 2024

fix: Fix sorting buffer in clustering compaction (#38417)

6ffc57c

issue: #28410 Signed-off-by: Cai Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Clustering optimization #28410

[Feature]: Clustering optimization #28410

wayblink commented Nov 14, 2023 •

edited

Loading

xiaofan-luan commented Nov 14, 2023

xiaocai2333 commented Dec 4, 2024

[Feature]: Clustering optimization #28410

[Feature]: Clustering optimization #28410

Comments

wayblink commented Nov 14, 2023 • edited Loading

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

xiaofan-luan commented Nov 14, 2023

xiaocai2333 commented Dec 4, 2024

wayblink commented Nov 14, 2023 •

edited

Loading