-
Notifications
You must be signed in to change notification settings - Fork 4
Indicate for Queries that they are not valid without user added tokens #217
Comments
I used the new message-search for some time now and it works as expected. However, the tokens extracted are often too meaningless (e. g. only adverbs/adjectives like "gerne"). Often, no tokens at all are extracted rendering the query useless. At the same time, the more-like-this-based query gave quite decent results and I have to say I miss it. I came to the conclusion that bot query providers have got different constraints making them useful. In this case, the WDYT? |
Ich stimme Dir zu. Die Herausforderung ist, dass MLT as Ergebnis eine Conversation hatte und die ConversationSearch nun einzelne Messages zurück gibt. Damit der MLT Provider mit der Token basierten ConverationSearch zusammenpasst muss man Ihn so umbauen, dass er auch Messages als Ergebnis liefert. Dazu muss man testen ob der MLT Ansatz mit den potenziell kurzen Texten von Messages funktioniert. Wenn nicht, dann muss man sich auch da noch etwas einfallen lassen |
@westei what about analyzing similarities of a "window" of messages. I don't know how to implement this inside solr (without permutating too much) and blowing up the index |
For sure this would increase the index size, but I do not think this is a problem. The problem is more that you will get overlapping segments of conversations as results ( e.g. a result For the Conversation Search we use Solr Grouping to get the Results grouped by conversation and the Response merges overlapping sections (based on the context configuration). For Solr MLT one can not use grouping so implementing this would be much harder. In addition without grouping one can not tell Solr to only include max. 3 results for a conversation (otherwise If I request e.g. 10 results I could get all 10 from the same conversation) Maybe one can combine MLT with FieldCollapsing to get the desired behaviour |
jup, understood. I imagine that a window of the messages issued by the author could be a good base for the more-like-this analysis |
@westei Could you please provide a suggestion how to solve this problem. So we can discuss how to proceed/implement a better user experience? |
Challenge: Combining the Suggestion
@janrudolph Do you agree? |
After some experiments and testing I come to the conclusion that the best technical solution is to:
This has the huge advantage that the MLT query is only used to retrieve interesting terms and a normal Solr Query is used for retrieving the results. All the special functionality on how to correctly retrieve related conversations incl. contextual messages does already work for normal Solr queries so their is no need to duplicate this functionality for Solr MLT queries In UI Terms: This would make the Similarity based search a feature of the related conversation search (the exact thing requested by @mrsimpson). This also allows to combine queries for Tokens with similarity based constraints. Something that could be useful if one wants to search for a custom token that is relatively common in the dataset - as the context would rank results containing the custom token in a similar context to the top of the result list. For that "Similarity" would need to be an switch that can be activated/deactivated by the user (similarly as filters as discussed in #228. I would suggest to enable "Similarity" if no user added token is present and deactivate it as soon as the user adds a custom token or pins an extracted Token. |
@westei I didn’t fully get the Solr implementation details, but the gist. And it sounds as if this made best use of the technology involved 👍 |
…onversation Search Query Builder * Indexing now stores a MLT Context for every Message. This includes the text of the surrounding messages based on content length, time difference, min/max message counts * Implementation if the similarity feature * The Related Conversation QueryBuilder performs a Solr MLT query on the conversation to get interesting terms * those are normalised so that the maximum boost is `1.0` * Query params for similarity search are built and added to the Query (field: `similarityQuery`) * The widget can send this as `q.alt`: In this case this query is used if no query is present * The widget can also combine this with real tokens. In this case the query params need to be appended to the other query parameters. * in this case the real query params should use an additional boost factor (in the range of `5 - 10`) NOTE: this increases the conversation index version from `5` to `6` so Smarti should trigger a full reindex on startup (for embedded setups). If this does not work for some reason the re-indexing needs to be manually triggered by deleting the current conversation index. For remote Solr Servers the schema needs to be updated and the index data need to be deleted to force a re-index on startup
Server-Side implementation is ready in the #217-similarity-feature-for-related-conversation-search branch. As the implementation has changes in the same files as #228 I used the branch of this Issue as a starting point. So the pull request #234 should be merged with |
@Peym4n the Related Conversation Query now has a new field The widget should send those parameters as value for the In addition the Widget should only consider custom and maybe pinned Tokens for the conversation search ("Expertengespräche"). This will make similarity the default behaviour. As alternative we could add an |
…nto #217-similarity-feature-for-related-conversation-search
The widget part is also implemented. There is an escaping bug on the server side which @westei will fix. |
* WordDilimiter: original is now kept also on query time to make searches like `c++` actually match the indexed token `c++` * added a PatternReplaceFilterFactory to remove quotes `,` and `;` on both sides and other tailing punctuation marks on terms
…' of github.com:redlink-gmbh/smarti into #217-similarity-feature-for-related-conversation-search
…-related-conversation-search Similarity feature for related conversation search (#217)
With #200 all queries are added so that the client is notified about this query to be configured.
So we need a new way telling the client that this query is currently not
usefull
(e.g. no tokens are assigned).User generated tokens can still make those queries active
The text was updated successfully, but these errors were encountered: