Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Conversation Index Updates for Indexing public channels #302

Open
westei opened this issue Mar 18, 2019 · 0 comments
Open

Conversation Index Updates for Indexing public channels #302

westei opened this issue Mar 18, 2019 · 0 comments
Assignees
Milestone

Comments

@westei
Copy link
Member

westei commented Mar 18, 2019

The Design of the Conversation Index had small conversations (like question-answering threads) in mind.

This assumption breaks down when public channels are indexed as a single conversation.

As this is now a use case we need to change the conversation index so that it can index channels with high number of messages.

@westei westei self-assigned this Mar 18, 2019
@westei westei added this to the v0.9.0 milestone Mar 18, 2019
westei added a commit that referenced this issue Mar 20, 2019
* Messages are now indexed as separate top-level documents
* For conversations the text of the last 50 (configureable) messages are stored as context
* The Indexing now uses Java Streams API to reduce memory footprint

This changes also required a lot of changes for the ConversationSearch (as it depended on the old index structure where messages where indexed as sub-documents of conversations).

Normalised the 3 different InterestingTerm implementation from the InterestingTermExtractor, ConversationSearch- and RCSearchQueryBuilder to a single one based on the ChatpalSearchQueryBuilder as this one was the most advanced implementation that compensated for different Solr TextField configurations by making additional Analysis Requests for Fields suggested by the initial MLT request. This change will allow to get rid of several copyField configurations (e.g. items) that will further improve the memory footprint.

The current state is that the old Tests are working with the new Index Structure, but given the changed use cases queries should be adapted accordingly

TODOs:

* As Conversations now only store the text of the last 50 Messages the MLT queries should be adapted to use the content of Messages instead
* Special field configurations used in combination with old InterestingTerms implementations are no longer needed as the new one works fine with any Field configuration.
* Further tests of the InterestingTerms components after the above adaptations
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant