You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using nomic embed text it is required to give a prefix for the model to give correct embeddings. The prefixes are different based on what the purpose is of the embedding.
For example the chunks can be prefixed with search_document: <chunk> and the query for retrieval from the vector database needs to be prefixed with search_query: <query>.
Also separating the query sent to the embedding model to be sent to the vector database and the prompt we want to use for the LLM.
E.g.: embedding query template can be a template with the query: search_query: {{question}} prompt where prompt can be a template with result of the embedding
You're a helpful assistant that uses this context and only this context and no previous knowledge to answer the question mentioned after the context.
<context>
{{query_result}}
</context>
<question>
{{question}}
</question>
As of now we can prefix manually by adding the correct prefix to the chunk and prompt (assuming the prompt isn't prefixed with something else), but it would be useful to have an input field that will nest the query with it.
Is this behavior unique to nomic text embed? I have not seen this on other embedding models before.
It is certainly something we can add to both query and chunking/splitting but I worry that adding these details is going to confuse 99% of people thinking they need to fill it out resulting in worse embeddings.
No, it's not unique to nomic-text-embed. There's various embedding models that have a prefix that can be direct the embeddings creation into a certain direction.
What would you like to see?
When using nomic embed text it is required to give a prefix for the model to give correct embeddings. The prefixes are different based on what the purpose is of the embedding.
For example the chunks can be prefixed with
search_document: <chunk>
and the query for retrieval from the vector database needs to be prefixed withsearch_query: <query>
.Also separating the query sent to the embedding model to be sent to the vector database and the prompt we want to use for the LLM.
E.g.:
embedding query template
can be a template with the query:search_query: {{question}}
prompt
whereprompt
can be a template with result of the embeddingAs of now we can prefix manually by adding the correct prefix to the chunk and prompt (assuming the prompt isn't prefixed with something else), but it would be useful to have an input field that will nest the query with it.
See also: https://huggingface.co/nomic-ai/nomic-embed-text-v1#usage
The text was updated successfully, but these errors were encountered: