Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Indicate for Queries that they are not valid without user added tokens #217

Closed
westei opened this issue Feb 26, 2018 · 13 comments
Closed

Indicate for Queries that they are not valid without user added tokens #217

westei opened this issue Feb 26, 2018 · 13 comments
Assignees
Milestone

Comments

@westei
Copy link
Member

westei commented Feb 26, 2018

With #200 all queries are added so that the client is notified about this query to be configured.

So we need a new way telling the client that this query is currently not usefull (e.g. no tokens are assigned).

User generated tokens can still make those queries active

@westei westei self-assigned this Feb 26, 2018
@ruKurz
Copy link
Collaborator

ruKurz commented Mar 5, 2018

#213

@mrsimpson
Copy link
Collaborator

I used the new message-search for some time now and it works as expected. However, the tokens extracted are often too meaningless (e. g. only adverbs/adjectives like "gerne"). Often, no tokens at all are extracted rendering the query useless. At the same time, the more-like-this-based query gave quite decent results and I have to say I miss it.

I came to the conclusion that bot query providers have got different constraints making them useful. In this case, the message-search is useful once an interesting term, noun or a user defined token has been provided. The MLT-search is useful as long as there is no user-defined-token is available.
Applying those constraints and respecting them in the UI by hiding useless queries could be a reasonable approach: As long as there are no keywords, related conversations are shown. As soon as the user specifies one to be search for, it is hidden and the search is visible.

WDYT?

@westei
Copy link
Member Author

westei commented Mar 9, 2018

Ich stimme Dir zu. Die Herausforderung ist, dass MLT as Ergebnis eine Conversation hatte und die ConversationSearch nun einzelne Messages zurück gibt.

Damit der MLT Provider mit der Token basierten ConverationSearch zusammenpasst muss man Ihn so umbauen, dass er auch Messages als Ergebnis liefert. Dazu muss man testen ob der MLT Ansatz mit den potenziell kurzen Texten von Messages funktioniert. Wenn nicht, dann muss man sich auch da noch etwas einfallen lassen

@mrsimpson
Copy link
Collaborator

@westei what about analyzing similarities of a "window" of messages. I don't know how to implement this inside solr (without permutating too much) and blowing up the index

@westei
Copy link
Member Author

westei commented Mar 9, 2018

For sure this would increase the index size, but I do not think this is a problem. The problem is more that you will get overlapping segments of conversations as results ( e.g. a result c1#m[3-8]and an other result c1#m[5-10]). One needs to collect those and generate the response accordingly.

For the Conversation Search we use Solr Grouping to get the Results grouped by conversation and the Response merges overlapping sections (based on the context configuration).

For Solr MLT one can not use grouping so implementing this would be much harder. In addition without grouping one can not tell Solr to only include max. 3 results for a conversation (otherwise If I request e.g. 10 results I could get all 10 from the same conversation)

Maybe one can combine MLT with FieldCollapsing to get the desired behaviour

@mrsimpson
Copy link
Collaborator

jup, understood. I imagine that a window of the messages issued by the author could be a good base for the more-like-this analysis

@ruKurz
Copy link
Collaborator

ruKurz commented Mar 12, 2018

@westei Could you please provide a suggestion how to solve this problem. So we can discuss how to proceed/implement a better user experience?

@ruKurz
Copy link
Collaborator

ruKurz commented Mar 12, 2018

Challenge: Combining the conversation-search query builder with the conversation-mlt query builder raises the question how to create a comprehensible user experience.

Suggestion

  • Server-side: Only use the conversation-mlt when no tokens have been extracted and no user tokens have
  • UI-Side: Adjust the user interface to present the conversation-mlt in the same way as the conversation-search results. (Do not make any UX difference, and hide the information of the query builder, used from the user. The user gets: Related conversations independent on the query builder)

@janrudolph Do you agree?

@westei westei added the ready label Mar 19, 2018
@westei westei added in progress and removed ready labels Apr 2, 2018
@westei
Copy link
Member Author

westei commented Apr 2, 2018

After some experiments and testing I come to the conclusion that the best technical solution is to:

  • calculate a textual context for every message
    • this context will be indexed in a field configured to be used by Solr MLT
  • on every related Conversation request I will make a Solr MLT request (will all the filters) with interestingTerms=details but without selecting any terms.
    • this will allow to reconstruct the query that Solr /mlt would internally use to select related conversations (e.g. for the context Java und Solr wozu das ganze? the interesting terms would be "interestingTerms":["text:wozu",1.0, "text:test",1.1805785,"text:solr",1.2813209]).
  • In the Related Query Response I will provide those information. So the Widget can decide to do a similar conversation query (by using those parameters) or not (by excluding those).

This has the huge advantage that the MLT query is only used to retrieve interesting terms and a normal Solr Query is used for retrieving the results. All the special functionality on how to correctly retrieve related conversations incl. contextual messages does already work for normal Solr queries so their is no need to duplicate this functionality for Solr MLT queries

In UI Terms:

This would make the Similarity based search a feature of the related conversation search (the exact thing requested by @mrsimpson).

This also allows to combine queries for Tokens with similarity based constraints. Something that could be useful if one wants to search for a custom token that is relatively common in the dataset - as the context would rank results containing the custom token in a similar context to the top of the result list.

For that "Similarity" would need to be an switch that can be activated/deactivated by the user (similarly as filters as discussed in #228. I would suggest to enable "Similarity" if no user added token is present and deactivate it as soon as the user adds a custom token or pins an extracted Token.

@mrsimpson
Copy link
Collaborator

@westei I didn’t fully get the Solr implementation details, but the gist. And it sounds as if this made best use of the technology involved 👍

westei added a commit that referenced this issue Apr 4, 2018
…onversation Search Query Builder

* Indexing now stores a MLT Context for every Message. This includes the text of the surrounding messages based on content length, time difference, min/max message counts
* Implementation if the similarity feature
    * The Related Conversation QueryBuilder performs a Solr MLT query on the conversation to get interesting terms
    * those are normalised so that the maximum boost is `1.0`
    * Query params for similarity search are built and added to the Query (field: `similarityQuery`)
    * The widget can send this as `q.alt`: In this case this query is used if no query is present
    * The widget can also combine this with real tokens. In this case the query params need to be appended to the other query parameters.
        * in this case the real query params should use an additional boost factor (in the range of `5 - 10`)

NOTE: this increases the conversation index version from `5` to `6` so Smarti should trigger a full reindex on startup (for embedded setups). If this does not work for some reason the re-indexing needs to be manually triggered by deleting the current conversation index. For remote Solr Servers the schema needs to be updated and the index data need to be deleted to force a re-index on startup
@westei
Copy link
Member Author

westei commented Apr 4, 2018

Server-Side implementation is ready in the #217-similarity-feature-for-related-conversation-search branch.

As the implementation has changes in the same files as #228 I used the branch of this Issue as a starting point. So the pull request #234 should be merged with 0.7.0 before.

@westei
Copy link
Member Author

westei commented Apr 4, 2018

@Peym4n the Related Conversation Query now has a new field similarityQuery that contains the parameters for similarity queries.

The widget should send those parameters as value for the q.alt parameter. This has the effect that similarity search is used in cases where no tokens are present

In addition the Widget should only consider custom and maybe pinned Tokens for the conversation search ("Expertengespräche"). This will make similarity the default behaviour.

As alternative we could add an [Similarity] button that can be enabled/disabled (similar to optional filters). If enabled the similarity query together with pinned and custom tokens would be used. If disabled all shown tokens with no Similarity would be used. Still similarity would be used as default if no token is extracted.

@Peym4n
Copy link
Contributor

Peym4n commented Apr 4, 2018

The widget part is also implemented.
Now only user tokens and pinned tokens will be used for related conversation search and when none of them exists (even when unpinned tokens exist) the similarity query is used.

There is an escaping bug on the server side which @westei will fix.

westei added a commit that referenced this issue Apr 5, 2018
-similarity-feature-for-related-conversation-search
westei added a commit that referenced this issue Apr 5, 2018
 * WordDilimiter: original is now kept also on query time to make searches like `c++` actually match the indexed token `c++`
* added a PatternReplaceFilterFactory to remove quotes `,` and `;` on both sides and other tailing punctuation marks on terms
westei added a commit that referenced this issue Apr 5, 2018
…' of github.com:redlink-gmbh/smarti into #217-similarity-feature-for-related-conversation-search
westei added a commit that referenced this issue Apr 5, 2018
…-related-conversation-search

 Similarity feature for related conversation search (#217)
@ja-fra ja-fra closed this as completed Apr 20, 2018
@ghost ghost removed the in review label Apr 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants