Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to exploit Change Streams instead of observe/observeChanges #999

Open
make-github-pseudonymous-again opened this issue May 25, 2024 · 5 comments
Labels
drop-meteor This has to do with replacing the Meteor framework with something else feature/search This issue is about searching the data in general mongodb This has to do with MongoDB. research This needs some research. vague This issue is too vaguely stated at the moment. Needs more investigation.
Milestone

Comments

@make-github-pseudonymous-again
Copy link
Contributor

observeChangesAsync can have extreme lag when combined with raw collection operations (e.g. transactions). We should try to exploit Change Streams instead. Might also help with migrating away from Meteor.

See:

@make-github-pseudonymous-again make-github-pseudonymous-again added drop-meteor This has to do with replacing the Meteor framework with something else research This needs some research. mongodb This has to do with MongoDB. labels May 25, 2024
@make-github-pseudonymous-again make-github-pseudonymous-again added vague This issue is too vaguely stated at the moment. Needs more investigation. feature/search This issue is about searching the data in general labels May 25, 2024
@make-github-pseudonymous-again
Copy link
Contributor Author

One problem with point 3. is that watch does not seem to handle all possible queries that find handles. A possible solution is to resort to polling the query using find whenever watch triggers an event for a reasonable superset of the input query.

For ordered callbacks (addedBefore, removedBefore), the different query responses can be compared using a longest increasing subsequence algorithm in order to trigger the corresponding callbacks:

  1. map old response items to their position in the new response (remove any item that is not in the new response)
  2. compute the LIS of these positions, this is the sequence of items that stayed in place and for which we at most need to call changed callbacks
  3. items not part of the LIS need to be removedBefore from their previous position and addedBefore to their new position
  4. items part of the old response but not part of the new response need to be removedBefore
  5. items part of the new response but not part of the old response need to be addedBefore

This is implemented by Meteor's diff-sequence package:

https://github.com/meteor/meteor/blob/7e2abe9464c215aace47d20f737e59243a819871/packages/diff-sequence/diff.js#L64

For unordered callbacks (added, removed), the solution is simpler: at most changed for what is in both the old and the new response, removed for what is not in the new response, and added for what is in the old response.

This is implemented by Meteor's diff-sequence package:

https://github.com/meteor/meteor/blob/7e2abe9464c215aace47d20f737e59243a819871/packages/diff-sequence/diff.js#L28

@make-github-pseudonymous-again
Copy link
Contributor Author

Note that Meteor's current implementation also fails to implement all possible queries supported by find: it resorts to polling if minimongo cannot create a matcher for it

https://github.com/meteor/meteor/blob/628dcd24d7685e8c9a5e4182aaa564736b3cc8fc/packages/mongo/mongo_driver.js#L1536-L1546

@make-github-pseudonymous-again
Copy link
Contributor Author

Note also that Meteor's oplog observe driver becomes quite complicated once limit is involved:

https://github.com/meteor/meteor/blob/628dcd24d7685e8c9a5e4182aaa564736b3cc8fc/packages/mongo/oplog_observe_driver.js#L51-L68

Unless watch does not suffer from the same inherent complexity, it would be wiser to start implementing a simple polling solution first.

@make-github-pseudonymous-again
Copy link
Contributor Author

If resorting to pure change stream updates instead of polling, the following ChangeStream change event handler may make sense:

const handleChangeStreamEvents = <TSchema extends { _id: string, [key: string]: any }>(
  self: Subscription,
  collectionName: string,
  doc: ChangeStreamDocument<TSchema>
) => {
  switch (doc.operationType) {
    case "replace":
      if (doc.fullDocument) {
        self.changed(collectionName, doc.documentKey._id, doc.fullDocument);
        }
      break;
    case "insert":
      if (doc.fullDocument) {
        self.added(collectionName, doc.documentKey._id, doc.fullDocument);
    }
      break;
    case "delete":
      self.removed(collectionName, doc.documentKey._id);
      break;
    case "update":
      const {removedFields, updatedFields} = doc.updateDescription;
      const fields = Object.fromEntries([
    ...(removedFields?.map((key: keyof TSchema) => [key, undefined] as const) ?? []),
    ...Object.entries(updatedFields ?? {}),
      ]) as Partial<TSchema>;
      self.changed(collectionName, doc.documentKey._id, fields);
      break;
    case "drop":
    case "dropDatabase":
    case "rename":
    case "invalidate":
      self.stop();
      break;
    default:
      break;
  }
};

See also: https://forums.meteor.com/t/pub-sub-with-mongodb-change-stream/57495.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
drop-meteor This has to do with replacing the Meteor framework with something else feature/search This issue is about searching the data in general mongodb This has to do with MongoDB. research This needs some research. vague This issue is too vaguely stated at the moment. Needs more investigation.
Projects
Status: In progress
Development

No branches or pull requests

1 participant