You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After adding a UTF-8 document to the knowledgebase that contains Unicode symbols, I've noticed that ragflow always generates text fragments that corrupt the Unicode symbols (rendering them as latin-1, apparently).
Expected behavior
Steps to reproduce
Upload a document as a .txt file saved as UTF-8 containing the following:
> “sample text,”
Additional information
No response
The text was updated successfully, but these errors were encountered:
I don't think it's an issue with the llm integration; here is the same problem displayed in the vector search results on the "Search" page rather than the "Chat" page:
It seems you are converting to Windows-1252 (a legacy encoding) at some point and then emitting it as if it were UTF-8, because I can correct it with iconv:
Is there an existing issue for the same bug?
RAGFlow workspace code commit ID
N/A
RAGFlow image version
5fb9136
Other environment information
Actual behavior
After adding a UTF-8 document to the knowledgebase that contains Unicode symbols, I've noticed that ragflow always generates text fragments that corrupt the Unicode symbols (rendering them as latin-1, apparently).
Expected behavior
Steps to reproduce
Upload a document as a .txt file saved as UTF-8 containing the following: > “sample text,”
Additional information
No response
The text was updated successfully, but these errors were encountered: