What pipeline best fits the below use case for similarity scoring of two texts? #2328

asharm0662 · 2022-03-18T05:56:30Z

asharm0662
Mar 18, 2022

Hi Team,

I am wanting to use haystack to compare the similarity of between two sentences, paragraphs, or documents, vs the semantic searches against FAQ documents, I have been seeing?

For example, is there pipeline that allows me to feed the text as two sentences and get back a similarity score only?:

I go to store.
I am going to store.

Similarity: .9082

Thank you in advance.

TuanaCelik · 2022-03-22T22:01:24Z

TuanaCelik
Mar 22, 2022

Hi @asharm0662 and first of all sorry that it’s taken this long to answer this. To better understand what might be the best solution for you could you give me some more detail about your use-case? Are you wanting to achieve a pipeline that finds similarities to a given sentence in a set of documents? The reason I’m asking is I’m trying to understand whether you need a document store at all.
Or, do you want to just provide 2 sentences/paragraphs/docs and get the similarity score between them?

The SentenceTransformersRanker might be what you're looking for. For example to be used on top of a retriever to rank the most relevant. But again this depends if this is what you're trying to achieve.

If you can fill me in a bit further I'll be happy to help out!

1 reply

asharm0662 Mar 26, 2022
Author

Hey @TuanaCelik , thank you for the reply! My use case is plagiarism. How can I tell if a student has plagiarized based on sentence similarity. So if an article says "the ball is green" and student copies the text as "green is the ball" how similar are these sentences based on some sort score. Would SentenceTransfomersRanker still be the use case for this?

TuanaCelik · 2022-04-19T09:28:29Z

TuanaCelik
Apr 19, 2022

Hi @asharm0662 , first of all so sorry for the late reply, it seems I'm not notified about these. From what I'm seeing, if you only want to be able to compare two documents, in which case the SentenceTransformerRanker in a pipeline would indeed make sense but there's not a straight forward way of being able to provide two documents as input to the pipeline. Instead, you can have a DocumentStore with some of the documents already there and provide the second one as input, then you can get to see which documents in DocumentStore it is most similar to. If, however, you want to be able to just compare 2 bits of text and see if they are similar, I would say a pipeline makes less sense and picking a sentence similarity model from HuggingFace and using just that as is.

1 reply

asharm0662 Apr 19, 2022
Author

excellent thank you @TuanaCelik

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What pipeline best fits the below use case for similarity scoring of two texts? #2328

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

What pipeline best fits the below use case for similarity scoring of two texts? #2328

asharm0662 Mar 18, 2022

Replies: 2 comments · 2 replies

TuanaCelik Mar 22, 2022

asharm0662 Mar 26, 2022 Author

TuanaCelik Apr 19, 2022

asharm0662 Apr 19, 2022 Author

asharm0662
Mar 18, 2022

Replies: 2 comments 2 replies

TuanaCelik
Mar 22, 2022

asharm0662 Mar 26, 2022
Author

TuanaCelik
Apr 19, 2022

asharm0662 Apr 19, 2022
Author