Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse order query causes performance bottleneck of cassandra cluster #240

Open
ZhiXingHeYiApple opened this issue Oct 28, 2024 · 4 comments

Comments

@ZhiXingHeYiApple
Copy link

CassandraJournalStatements.scala

  def selectHighestSequenceNr =
    s"""
     SELECT sequence_nr FROM $tableName WHERE
       persistence_id = ? AND
       partition_nr = ?
       ORDER BY sequence_nr
       DESC LIMIT 1
   """

journal messages table of cassandra below:

CREATE TABLE IF NOT EXISTS pekko.messages (
  persistence_id text,
  partition_nr bigint,
  sequence_nr bigint,
  timestamp timeuuid,
  timebucket text,
  writer_uuid text,
  ser_id int,
  ser_manifest text,
  event_manifest text,
  event blob,
  meta_ser_id int,
  meta_ser_manifest text,
  meta blob,
  tags set<text>,
  PRIMARY KEY ((persistence_id, partition_nr), sequence_nr, timestamp))     --- sequence_nr default is asc order
  WITH gc_grace_seconds =864000
  AND compaction = {
    'class' : 'SizeTieredCompactionStrategy',
    'enabled' : true,
    'tombstone_compaction_interval' : 86400,
    'tombstone_threshold' : 0.2,
    'unchecked_tombstone_compaction' : false,
    'bucket_high' : 1.5,
    'bucket_low' : 0.5,
    'max_threshold' : 32,
    'min_threshold' : 4,
    'min_sstable_size' : 50
    };

When business application actors are restored from cassandra disks, there will be a large number of slow queries. Is there any way to eliminate this reverse order query? Thank you!

@Roiocam
Copy link
Member

Roiocam commented Oct 28, 2024

I am not sure why the Cassandra plugin needs this at recovering.

but we could optimize the execution plan by cutting the result set. i.e., WHERE sequence_nr > ? it will reduce the ordering on RDBMS.

@pjfanning
Copy link
Contributor

Can we just use max?

SELECT max(sequence_nr) FROM $tableName WHERE
       persistence_id = ? AND
       partition_nr = ?`

This stackoverflow answer says that Cassandra supports it.
https://stackoverflow.com/questions/41717163/query-to-select-max-value-of-a-column-using-max-using-cassandra

@pjfanning
Copy link
Contributor

@ZhiXingHeYiApple as a workaround until we can fix the code, is it possible for you to add an index to pekko.messages?
You could have one that indexes based on (persistence_id, partition_nr), sequence_nr DESC. I'm not a Cassandra expert. I know more about RDBMS.

@Roiocam
Copy link
Member

Roiocam commented Oct 29, 2024

Can we just use max?

SELECT max(sequence_nr) FROM $tableName WHERE
       persistence_id = ? AND
       partition_nr = ?`

This stackoverflow answer says that Cassandra supports it. stackoverflow.com/questions/41717163/query-to-select-max-value-of-a-column-using-max-using-cassandra

It will ignore the null value, but still sorting a large set. https://stackoverflow.com/questions/34246403/performance-of-max-vs-order-by-desc-limit-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants