You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.
In setting where public channels with a lot of messages are indexed Smarti suffer from very high memory usage resulting in extrem slowdowns due to high GC load. In settings with lower heap space limits we even see OOM errors such as
Exception in thread "conversation-indexing-thread-1" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
[..]
at java.io.StringWriter.write(StringWriter.java:77)
at org.apache.solr.common.util.XML.escape(XML.java:203)
at org.apache.solr.common.util.XML.escapeAttributeValue(XML.java:80)
at org.apache.solr.common.util.XML.writeXML(XML.java:138)
at org.apache.solr.client.solrj.util.ClientUtils.writeVal(ClientUtils.java:125)
[..]
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:190)
at io.redlink.smarti.query.conversation.ConversationIndexer.updateConversation(ConversationIndexer.java:253)
After investigations this was caused by the combination of the following:
Could Sync loads conversations in batches of 10
Indexing conversations requires copy the text of messages
to merge messages sent by the same user within a configured time limit
collect text of messages with the conversation (done for MLT searches)
the use of an embedded Solr Server requires also the Server side processing of large document to be in the same JVM
Messages are stored as sub-documents of the conversation resulting in huge Solr Documents for conversations
Copy Field configurations with different analysis will require to keep multiple copies of the text in memory
The text was updated successfully, but these errors were encountered:
in a first step I will make the batch size configurable (see #300). This should decrease the memory pressure as only a single conversation is loaded into memory. The other points require further analysis as changes with those could affect the functionality of the conversation search and the similar conversation search
NOTE: the cuase of this is that public channels with a high amount of messages can not be indexed in the current index layout. The adaption of the index layout to this new requirement will be implemented by #302 in smarti 0.9.0
In setting where public channels with a lot of messages are indexed Smarti suffer from very high memory usage resulting in extrem slowdowns due to high GC load. In settings with lower heap space limits we even see OOM errors such as
After investigations this was caused by the combination of the following:
The text was updated successfully, but these errors were encountered: