Skip to content

Conversation

@jannistsiroyannis
Copy link
Contributor

This changes Elastic indexing to occur strictly in order with postgres writes.

Every change is logged (in a table) with a sequence number (assigned within the postgres transaction), and indexing occurs by reading that log in the same order. There is still a tolerance for bad data, in the sense that any 400-response from Elastic will be error-logged but ignored. Other types of errors however (be they timeouts, broken connections, or whatever else) are NOT TOLERATED, and will be retried, in order (forever).

A big change for developers in this, is that the housekeeping module must now be running for any indexing to occur at all. It is however perfectly fine to start it up late (it will catch up on its own).

update.setObject(8, doc.getShortId(), OTHER)
}

boolean saveVersion(Document doc, Connection connection, Date createdTime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be changed to void ?

* Returns true if anything was written.
*/
boolean storeAtomicUpdate(String id, boolean minorUpdate, boolean writeIdenticalVersions, String changedIn, String changedBy, UpdateAgent updateAgent) {
Document preUpdateDoc = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this variable can be removed?


void storeAtomicUpdate(Document doc, boolean minorUpdate, boolean writeIdenticalVersions, String changedIn, String changedBy, String oldChecksum) {
normalize(doc)
Document preUpdateDoc = storage.load(doc.shortId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is unused now and can be removed

Copy link
Contributor

@olovy olovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM!

Remaining possibility for small inconsistencies:
If an elastic request fails after increment/decerement reverse links they will be bumbed again when the whole document is retried. I think this is fine for now. Lots of added complexity to fix this?
The fix would be to reindex them instead of just updating the counters the second time around?

I would like to have a gauge for "seconds behind" in /metrics so that we can track it.
I can add that.

Interesting to see the transcribed (groovy -> java) indexing code. A bit more clunky but not that bad.

@olovy olovy marked this pull request as draft May 14, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants