Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Milvus insertion time getting slow after 3rd loop #38768

Closed
1 task done
kksasa opened this issue Dec 26, 2024 · 2 comments
Closed
1 task done

[Bug]: Milvus insertion time getting slow after 3rd loop #38768

kksasa opened this issue Dec 26, 2024 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@kksasa
Copy link

kksasa commented Dec 26, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:v2.5
- Deployment mode(standalone or cluster):standalone 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I encountered a strange problem: the insertion of Milvus data takes 10 seconds after the third loop, seemingly regardless of the data volume. However, when I terminate the loop, the first insertion only takes 3 seconds.

My code is roughly as follows:

python
embedding = Embeddings('BAAI/bge-m3', use_gpu=True)
vdb = Vector(embedding, db_name) # Connect to Milvus
for file in files:
chunks = convert_raw_2_vdb(vdb, file) # Start chunking into Milvus here
The following is the printed log:

plaintext
file_path = dataset/1.md
total_count = 18
time costs for embedding = 0.4152650833129883
time costs for insert milvus = 3.0823850631713867

file_path = dataset/2.md
total_count = 5
time costs for embedding = 0.10074329376220703
time costs for insert milvus = 7.72633171081543

file_path = dataset/3.md
total_count = 12
time costs for embedding = 0.15440130233764648
time costs for insert milvus = 10.726555109024048

file_path = dataset/4.md
total_count = 12
time costs for embedding = 0.15440130233764648
time costs for insert milvus = 10.776616109025023

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@kksasa kksasa added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 26, 2024
@kksasa
Copy link
Author

kksasa commented Dec 26, 2024

##main code
embedding = Embeddings('BAAI/bge-m3', use_gpu=True)
vdb = Vector(embedding, db_name) # Connect to Milvus
for file in files:
       chunks = convert_raw_2_vdb(vdb, file) # Start chunking into Milvus here
## insert api
 def add_texts_new(self, documents: list[Document], d_embeddings: list[list[float]], s_embeddings: list[list[float]], **kwargs):  
        
        ins = [ [page.metadata for page in documents],                 
                [page.page_content for page in documents],
                d_embeddings,                              
            ]
        self.collection.insert(ins)
        self.collection.flush()

@kksasa
Copy link
Author

kksasa commented Dec 26, 2024

find the reason, self.collection.flush() is the contribution

@kksasa kksasa closed this as completed Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants