Conversation Index Updates for Indexing public channels #302

westei · 2019-03-18T09:08:39Z

The Design of the Conversation Index had small conversations (like question-answering threads) in mind.

This assumption breaks down when public channels are indexed as a single conversation.

As this is now a use case we need to change the conversation index so that it can index channels with high number of messages.

* Messages are now indexed as separate top-level documents * For conversations the text of the last 50 (configureable) messages are stored as context * The Indexing now uses Java Streams API to reduce memory footprint This changes also required a lot of changes for the ConversationSearch (as it depended on the old index structure where messages where indexed as sub-documents of conversations). Normalised the 3 different InterestingTerm implementation from the InterestingTermExtractor, ConversationSearch- and RCSearchQueryBuilder to a single one based on the ChatpalSearchQueryBuilder as this one was the most advanced implementation that compensated for different Solr TextField configurations by making additional Analysis Requests for Fields suggested by the initial MLT request. This change will allow to get rid of several copyField configurations (e.g. items) that will further improve the memory footprint. The current state is that the old Tests are working with the new Index Structure, but given the changed use cases queries should be adapted accordingly TODOs: * As Conversations now only store the text of the last 50 Messages the MLT queries should be adapted to use the content of Messages instead * Special field configurations used in combination with old InterestingTerms implementations are no longer needed as the new one works fine with any Field configuration. * Further tests of the InterestingTerms components after the above adaptations

westei self-assigned this Mar 18, 2019

westei added the enhancement label Mar 18, 2019

westei added this to the v0.9.0 milestone Mar 18, 2019

This was referenced Mar 18, 2019

Conversation Indexing Improvements #296

Closed

High Memory Consumption during Conversation Indexing #299

Closed

Limit Messages per Conversation #281

Closed

westei mentioned this issue Mar 21, 2019

Update ConversationSearch and RelatedConversation for for the new Indexing Structure #303

Open

westei mentioned this issue Mar 29, 2019

Conversation search endpoint returns inconsistent number of results #256

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversation Index Updates for Indexing public channels #302

Conversation Index Updates for Indexing public channels #302

westei commented Mar 18, 2019

Conversation Index Updates for Indexing public channels #302

Conversation Index Updates for Indexing public channels #302

Comments

westei commented Mar 18, 2019