Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

High Memory Consumption during Conversation Indexing #299

Closed
westei opened this issue Feb 18, 2019 · 2 comments
Closed

High Memory Consumption during Conversation Indexing #299

westei opened this issue Feb 18, 2019 · 2 comments
Assignees
Labels
Milestone

Comments

@westei
Copy link
Member

westei commented Feb 18, 2019

In setting where public channels with a lot of messages are indexed Smarti suffer from very high memory usage resulting in extrem slowdowns due to high GC load. In settings with lower heap space limits we even see OOM errors such as

Exception in thread "conversation-indexing-thread-1" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
[..]
    at java.io.StringWriter.write(StringWriter.java:77)
    at org.apache.solr.common.util.XML.escape(XML.java:203)
    at org.apache.solr.common.util.XML.escapeAttributeValue(XML.java:80)
    at org.apache.solr.common.util.XML.writeXML(XML.java:138)
    at org.apache.solr.client.solrj.util.ClientUtils.writeVal(ClientUtils.java:125)
[..]
    at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:190)
    at io.redlink.smarti.query.conversation.ConversationIndexer.updateConversation(ConversationIndexer.java:253)

After investigations this was caused by the combination of the following:

  1. Could Sync loads conversations in batches of 10
  2. Indexing conversations requires copy the text of messages
    1. to merge messages sent by the same user within a configured time limit
    2. collect text of messages with the conversation (done for MLT searches)
  3. the use of an embedded Solr Server requires also the Server side processing of large document to be in the same JVM
    1. Messages are stored as sub-documents of the conversation resulting in huge Solr Documents for conversations
    2. Copy Field configurations with different analysis will require to keep multiple copies of the text in memory
@westei westei self-assigned this Feb 18, 2019
@westei westei added the bug label Feb 18, 2019
@westei westei added this to the v0.8.0 milestone Feb 18, 2019
@westei
Copy link
Member Author

westei commented Feb 18, 2019

in a first step I will make the batch size configurable (see #300). This should decrease the memory pressure as only a single conversation is loaded into memory. The other points require further analysis as changes with those could affect the functionality of the conversation search and the similar conversation search

@westei
Copy link
Member Author

westei commented Mar 18, 2019

NOTE: the cuase of this is that public channels with a high amount of messages can not be indexed in the current index layout. The adaption of the index layout to this new requirement will be implemented by #302 in smarti 0.9.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant