Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Clustering optimization #28410

Open
4 of 19 tasks
wayblink opened this issue Nov 14, 2023 · 2 comments
Open
4 of 19 tasks

[Feature]: Clustering optimization #28410

wayblink opened this issue Nov 14, 2023 · 2 comments
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@wayblink
Copy link
Contributor

wayblink commented Nov 14, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Umbrella issue for clustering key optimization for milvus.

In the realm of database management, maximizing the efficiency of data storage and retrieval is of utmost importance. A clustering key stands out as a crucial element in database design, guiding the physical storage arrangement based on the distribution of data within a table. In conventional database systems, the usual data distribution revolves around the minimum and maximum values of scalar fields. However, in the case of a vector database, vectors take precedence as our primary entities. Consequently, in Milvus, we're committed to supporting both scalar clustering keys and vector clustering keys.

Key change:
1, Support designating a scalar or vector field as the clustering key for a collection.
2, Enabling bulk insert data with specific clustering information. Milvus will organize the data based on the provided clustering information.
3, Filtering out irrelevant data during searches based on clustering information.
4, Implementing a feature in Milvus to compact collections with a clustering key, leading to a rearrangement of storage.

Phase 1: Support bulk insert and query data with clustering info

Tasks:

Phase 2: Clustering based compaction

Dependency:

  • L0 delete/compaction
  • Milvus-storage V2 integration
  • Compaction V2 refactoring (weak dependency)

Tasks:

  • Clustering compaction strategy
  • Clustering compaction schedule
  • Clustering compaction execution
  • E2E test

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@wayblink wayblink added the kind/feature Issues related to feature request from users label Nov 14, 2023
@xiaofan-luan
Copy link
Collaborator

/assign @wayblink

@xiaocai2333
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

3 participants