You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.
The build_payloads function here is intended to generate unique payloads for each document chunk, but currently, all payloads contain the same text (the last chunk in doc.chunks) despite having unique IDs. This is because doc.metadata is directly referenced and updated in each iteration, causing all payloads to share the same modified metadata.
See the example in the attached screenshot.
Steps to Reproduce:
Call the build_payloads function with a Document object containing multiple chunks.
Observe that the payloads list contains different IDs but identical text (matching the last chunk).
Expected Behavior: Each payload should contain the unique text for its corresponding chunk, along with the associated metadata.
Actual Behavior: All payloads contain the same text, resulting in incorrect data.
Additional Context: This issue occurs because dictionaries in Python are mutable, and assigning payload = doc.metadata results in modifying the original doc.metadata in place.
@iusztinpaul Hi Paul! I believe I've found a small bug that could be crucial during the final LLM prediction phase. Could you please take a look at my PR?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description of the Issue :
The
build_payloads
function here is intended to generate unique payloads for each document chunk, but currently, all payloads contain the same text (the last chunk indoc.chunks
) despite having unique IDs. This is becausedoc.metadata
is directly referenced and updated in each iteration, causing all payloads to share the same modified metadata.See the example in the attached screenshot.
Steps to Reproduce:
Call the
build_payloads
function with a Document object containing multiple chunks.Observe that the payloads list contains different IDs but identical text (matching the last chunk).
Expected Behavior: Each payload should contain the unique text for its corresponding chunk, along with the associated metadata.
Actual Behavior: All payloads contain the same text, resulting in incorrect data.
Additional Context: This issue occurs because dictionaries in Python are mutable, and assigning
payload = doc.metadata
results in modifying the originaldoc.metadata
in place.Example
News Document from Alpaca
Payload Uploaded to Qdrant
Qdrant Query used to identify the issue.
The text was updated successfully, but these errors were encountered: