Currently spirit works like this:
- It starts a replication client to subscribe to table changes (tracking the PK only).
- It then progressively copies from the table using
INSERT IGNORE .. SELECT (src)
- Progressively the replication client changes are flushed using
REPLACE INTO .. SELECT and DELETE statements (src)
I propose we experiment with a modified algorithm, which I am calling the Buffered algorithm. I don't know if it will be faster, but I think it's worth creating a proof of concept to try it. Here's how it works:
- The replication client starts, but rather than tracking just the PK, it buffers the whole row.
- The copy reads from the table using much larger chunks (10-20s instead of 2s) and fills a buffer.
- The copy buffer is then flushed in 1-2s writes, but at a higher parallel factor.
- The replication client flushes conditionally. It has to check that the key is below the low watermark of the copier. This ensures that there is no race from the copy being split into two parts and there being a lost update (the race is very brief: as an alternative we could rely on the checksum to repair issues, but I think it's better to use it as a back-stop).
Why I think this has merit:
- (for Aurora specifically) The longer reads trigger Aurora's read ahead which helps a lot when the data is not in cache. We can also use longer reads, because they won't be locking.
- The increased parallelism writes make good use of group commit, which Aurora also really benefits from. This is particularly beneficial because the writes are basically guaranteed to be disjoint sets.
- (for MySQL) the medium sized batches are more likely to be optimal for multi-threaded replicas. I believe (if I am paraphrasing JFGs research correctly) that disjoint small batches work best.
- This algorithm can also be used in other contexts, such as ShardMerge/MoveTables. These
INSERT.. SELECT / REPLACE .. SELECT statements are the only thing in spirit tying it to a single server.
I call it the 'buffered' algorithm because this is the most distinctive difference from the existing algorithms. I previously considered calling this the 'Aurora' algorithm, but upon reflection I think that's not entirely true.
It's not too much of a code change to implement:
- Because we technically already have 2 methods to track replication changes (deltaMap and deltaQueue) we should extract this to an interface.
- There will be some work (and testing) to read in multiple chunks, fill a buffer, and write in a greater number (5-10x more threads).
There will be a lot of work in experimenting though, to confirm it is beneficial.
Edit: I was reading back through https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b and noticed that this describes the buffered algorithm exactly (lock free on the source, high and lower watermarks for replication stream processing, using chunks for copying table data).
Currently spirit works like this:
INSERT IGNORE .. SELECT(src)REPLACE INTO .. SELECTandDELETEstatements (src)I propose we experiment with a modified algorithm, which I am calling the Buffered algorithm. I don't know if it will be faster, but I think it's worth creating a proof of concept to try it. Here's how it works:
Why I think this has merit:
INSERT.. SELECT/REPLACE .. SELECTstatements are the only thing in spirit tying it to a single server.I call it the 'buffered' algorithm because this is the most distinctive difference from the existing algorithms. I previously considered calling this the 'Aurora' algorithm, but upon reflection I think that's not entirely true.
It's not too much of a code change to implement:
There will be a lot of work in experimenting though, to confirm it is beneficial.
Edit: I was reading back through https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b and noticed that this describes the buffered algorithm exactly (lock free on the source, high and lower watermarks for replication stream processing, using chunks for copying table data).