RAG fail with TypeError: argument 'text': 'HumanMessage' object cannot be converted to 'PyString' #186
-
[This question is also asked in Discord] Tagging @fjsj for visibility Hello dear all, File "/home/shako/REPOS/EpicLaunchX/core/.venv/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 441, in _tokenize
token = encoding.encode_ordinary(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/shako/REPOS/EpicLaunchX/core/.venv/lib/python3.11/site-packages/tiktoken/core.py", line 70, in encode_ordinary
return self._core_bpe.encode_ordinary(text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument 'text': 'HumanMessage' object cannot be converted to 'PyString' Calling assistant as: # Calling assistant
pattern_ai_assistant = CodePatternAIAssistant(user=request.user)
output = pattern_ai_assistant.run(
f"I need a match pattern for given task with uuid {task.uuid} and code implementation section as {task.code_implementation}."
f"Please generate the match pattern for the given task code implementation",
thread_id=thread.id,
) Assistant itself: class CodePatternAIAssistant(AIAssistant):
id = "code_patterns_assistant"
name = "Code Patterns Assistant"
instructions = (
"You are a project assistant, who creates the code pattern to match the code implementation of the given task and saves it to the database. "
"Based on the code implementation please generate patterns. "
"Use the following pieces of retrieved context from Pattern Syntax doc to generate the code pattern. "
)
model = OPENAI_MODEL
has_rag = True
_user: User
def persist(self):
path = Path(__file__).parent
file_path = path / "data/pattern-syntax.md"
persist_dir = path / "data"
loader = TextLoader(str(file_path))
documents = loader.load()
text_splitter = MarkdownTextSplitter()
split_documents = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
vector_store = Chroma.from_documents(
documents=split_documents,
embedding=embeddings,
persist_directory=str(persist_dir),
collection_name="patterns",
)
vector_store.persist()
def get_retriever(self) -> BaseRetriever:
# NOTE: on a production application, you should persist or cache the retriever,
# updating it only when documents change.
path = Path(__file__).parent
persist_dir = path / "data"
self.persist()
embeddings = OpenAIEmbeddings()
vector_store = Chroma(
collection_name="patterns", embedding_function=embeddings, persist_directory=str(persist_dir)
)
return vector_store.as_retriever(
search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
) As I understand it tries to tokenize the user message and fails.
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
I will test this manually and reach back to you soon! |
Beta Was this translation helpful? Give feedback.
-
@ShahriyarR We've just released version |
Beta Was this translation helpful? Give feedback.
@ShahriyarR We've just released version
0.1.1
, which includes the fix from PR #187. Could you try upgrading to the latest version and let us know if the issue is resolved? Thanks!