Add CRR Cascaded capabilities#6179
Conversation
Hello sylvainsenechal,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
| bucketName: request.bucketName, | ||
| objectKey: request.objectKey, | ||
| }); | ||
| return callback(errors.OperationAborted); |
There was a problem hiding this comment.
many 409 available in arsenal. we can pick this one, maybe some other ones, or create our own
| "@hapi/joi": "^17.1.1", | ||
| "@smithy/node-http-handler": "^3.0.0", | ||
| "arsenal": "git+https://github.com/scality/Arsenal#8.4.2", | ||
| "arsenal": "file:../Arsenal", |
There was a problem hiding this comment.
arsenal and @scality/cloudserverclient are pointing to local filesystem paths (file:../Arsenal, file:../cloudserverclient) instead of pinned git tags. This will break installs for anyone who does not have these directories locally, and CI will fail to resolve them.
| "arsenal": "file:../Arsenal", | |
| "arsenal": "git+https://github.com/scality/Arsenal#8.4.2", |
— Claude Code
| "devDependencies": { | ||
| "@eslint/compat": "^1.2.2", | ||
| "@scality/cloudserverclient": "1.0.7", | ||
| "@scality/cloudserverclient": "file:../cloudserverclient", |
There was a problem hiding this comment.
Same issue: file:../cloudserverclient should be pinned to a published version or git tag.
— Claude Code
| }); | ||
|
|
||
| describe('CRR cascade — putMetadata', () => { | ||
| it('second write with the same microVersionId returns loop-detected', async () => { |
There was a problem hiding this comment.
Test names using it() should start with should per project conventions.
— Claude Code
| }], | ||
| }, | ||
| })); | ||
| }); |
There was a problem hiding this comment.
Missing after() hook to clean up the test buckets (TEST_BUCKET, TEST_BUCKET_CRR, DEST_BUCKET). Without cleanup, leftover buckets accumulate across test runs and can cause name collisions or resource leaks.
— Claude Code
There was a problem hiding this comment.
bucket/keys names are randomized
I intentionnally didnt add any after to keep the test and file lighter.
Do we want to add after cleanup ?
Review by Claude Code |
There was a problem hiding this comment.
all these worked locally, need to wait for the bumps now
| "@hapi/joi": "^17.1.1", | ||
| "@smithy/node-http-handler": "^3.0.0", | ||
| "arsenal": "git+https://github.com/scality/Arsenal#8.4.2", | ||
| "arsenal": "file:../Arsenal", |
c15fb2b to
7912480
Compare
❌ 103 Tests Failed:
View the top 3 failed test(s) by shortest run time
View the full list of 8 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
| }); | ||
|
|
||
| describe('CRR cascade — putMetadata', () => { | ||
| it('second write with the same microVersionId returns loop-detected', async () => { |
There was a problem hiding this comment.
Test names should start with "should" per project conventions (e.g. should return loop-detected on second write with the same microVersionId).
— Claude Code
| "@hapi/joi": "^17.1.1", | ||
| "@smithy/node-http-handler": "^3.0.0", | ||
| "arsenal": "git+https://github.com/scality/Arsenal#8.4.2", | ||
| "arsenal": "file:../Arsenal", |
There was a problem hiding this comment.
arsenal and @scality/cloudserverclient use file: local path references. These must be pinned to a tag or version before merging — file: references will break any install outside of your local workstation.
— Claude Code
|
7912480 to
973261a
Compare
| } | ||
|
|
||
| const encodedMicroVersionId = request.headers['x-scal-micro-version-id']; | ||
| const incomingRaw = encodedMicroVersionId ? decodeMicroVersionId(encodedMicroVersionId) : null; |
There was a problem hiding this comment.
The decode error handling here differs from putMetadata (line 574-575), which normalises the result immediately so incomingRaw is guaranteed non-Error. Here incomingRaw can hold an Error object, then is checked later with !(incomingRaw instanceof Error). Consider aligning with the putMetadata pattern for consistency.
— Claude Code
| "@hapi/joi": "^17.1.1", | ||
| "@smithy/node-http-handler": "^3.0.0", | ||
| "arsenal": "git+https://github.com/scality/Arsenal#8.4.2", | ||
| "arsenal": "file:../Arsenal", |
There was a problem hiding this comment.
arsenal and @scality/cloudserverclient point to local file:../ paths. These must be pinned to git tags before merging — anyone cloning this repo won't have these sibling directories.
— Claude Code
|
Review by Claude Code |
92f01c0 to
365213b
Compare
|
365213b to
37068a8
Compare
3e4bd1d to
9840ffa
Compare
391c544 to
3c698dc
Compare
| // as cascade targets because they use the MultiBackend S3 path which | ||
| // bypasses the putData/putMetadata routes, so loop detection cannot fire | ||
| // on those destinations. | ||
| const BLOCKED_LOCATION_TYPES = ['location-scality-ring-s3-v1', 'location-scality-artesca-s3-v1']; |
There was a problem hiding this comment.
Could it be tested on a real lab ?
There was a problem hiding this comment.
should be honestly, i think i need to double check this together with the design and the code
| replicationInfo: getReplicationInfo(config, | ||
| objectKey, bucketMD, false, size, null, null, authInfo), | ||
| overheadField, | ||
| updateMicroVersionId: true, |
There was a problem hiding this comment.
yes, actually this is important, I added this after making my tests in codespaces :
Without this :
You put an object : object isn't assigned an micro version Id
You setup the whole replication, backbeat picks up that source object, it doesn't have any microversion id so cloudserver just doesn't try to process it as a potential cascade replication
There was a problem hiding this comment.
- should "new" objects not keep having an empty microVersionId (to avoid wasting metadata space)?
and just consider "no micro version id" is the oldest micro versionId (the only thing it cannot compare to is another no micro version id - but that would not happen on new object, each get their own versionId) - in case of un-versioned buckets (i.e. only master objects), we may need to update the micro version id (because we have no version id) ; however there is no replication in that case, so no problem not to do it
There was a problem hiding this comment.
- Similar thread here: https://github.com/scality/citadel/pull/349#discussion_r3137680994
Should be done in objectPut instead:7a63acf(#6178)
There was a problem hiding this comment.
We discussed this a bit with Maël.
I think we could try to not do it all the time, and do it only for putObject (but not even sure).
But we were wondering what happens if we don't write microVersionId on putObject : how do we compare two objects that don't have any micro version ID, it may be possible on the first replication to consider no micro version id => source is newer, but then we must write the microVersionID anyways to block loop and make later write work.
Also in the design we said microVersionId would necessarily be used for this feature only but more generally to be able to track and compare update time, I'm very worried we're gonna have issues later if we start writing microVersionID only on certain conditiion. Actually the code in arsenal, backbeat and cloudserver is already filled with quite a lot of defensive and smelly if (microVersionID) because we never bothered adding this from the start
| request.resume(); | ||
| return _respondWithHeaders( | ||
| response, | ||
| { code: MicroVersionIdAlreadyStoredException.name, | ||
| message: 'incoming microVersionId already at destination' }, | ||
| {}, | ||
| log, | ||
| callback, | ||
| 409, | ||
| ); |
There was a problem hiding this comment.
could you remove duplication to simplify here ?
There was a problem hiding this comment.
Ok I did something, you take a look but honestly it's really not that great 🤔
There was a problem hiding this comment.
the code is completely duplicated, the only difference is really name of exception and "text" of message.
as discussed in cloudserverclient, do we really need cloudserver to make that distinction? just returning the same error (micro version id conflict) AND the micro version id of the object stored allows the caller to make the distinction, if they need - without duplication.
There was a problem hiding this comment.
ahhh, I missed the extra type MicroVersionIdAlreadyStoredException, c.f. https://github.com/scality/cloudserverclient/pull/24/changes#r3496549728
we had the discussion and ended adding microVersionId in StaleMicroVersionIdException, but it is useless if we generate 2 kinds of errors...
→ IMHO we should drop MicroVersionIdAlreadyStoredException, which allows deduplicating here and simplifying backbeat?
otherwise we shoud still dedup, something like:
| request.resume(); | |
| return _respondWithHeaders( | |
| response, | |
| { code: MicroVersionIdAlreadyStoredException.name, | |
| message: 'incoming microVersionId already at destination' }, | |
| {}, | |
| log, | |
| callback, | |
| 409, | |
| ); | |
| if (cmp !=== Ordering.NEWER) { | |
| request.resume(); | |
| return _respondWithHeaderCrrConflict( | |
| response, log, callback, | |
| cmp === Ordering.EQUAL ? MicroVersionIdAlreadyStoredException.name : StaleMicroVersionIdException.name, | |
| 'incoming microVersionId already at destination', | |
| ); | |
| } |
There was a problem hiding this comment.
Sending back the microVersionID is indeed not super useful, I added it because it was suggested it could become useful or nice to have the info for the caller. I'm not really gonna focus on this to much, i can remove it or keep it, imo I think it's fine that an api returns it.
But for the double error, well tbh i don't really mind either as I just wanna put an end to this feature, but for me it's better to have backbeat check the errors like this :
Than having it just receive a microVersion id, then does another loop if microversion == null, if microversion < > xxx, then we don't do exactly the same thing (cbOnce(cascadeLoopDetected) vs cbOnce(cascadeDataStale)).
In the screenshot, we directly have errors that we can nicely catch and that's pretty much it.
I just see calling the client like calling any functions, and if the function I call return an error, I want the error to be as detailed as possible and not redo the checks that was already done, because anyways cloudserver has to do those comparisons here.
There was a problem hiding this comment.
I will recheck the suggested simpification after we settle the discussion, not fully sure it should be done though because compare can return 4 values
older
younger
equal
not comparable
so doing this cmp !=== Ordering.NEWER
still leaves room for NOT_COMPARABLE, I think it's safer to assert on what we want that exclude only value
3c698dc to
c8acd08
Compare
c8acd08 to
51638dd
Compare
maeldonn
left a comment
There was a problem hiding this comment.
LGTM. Please bump package.json and create a new release. Could you just confirm that the location guard is working ?
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
| const incomingVersionIdEncoded = request.headers['x-scal-version-id']; | ||
| const decoded = incomingVersionIdEncoded ? decode(incomingVersionIdEncoded) : null; | ||
| const incomingVersionIdDecoded = decoded instanceof Error ? null : decoded; | ||
| if (incomingVersionIdDecoded && objMd && objMd.versionId === incomingVersionIdDecoded) { |
There was a problem hiding this comment.
objMd.versionId === incomingVersionIdDecoded
- Can this condition actually fail?
- Is it really part of the condition?
AFAIU, versionId is used (or should be?) to find the existing objMD:
- so if not found, we would get objMD=null ;
- and if found, it should have the versionID...
- Unless there is a some logic here which gets the matching version if it exist, else the latest one : in that case objMd.versionId !== incomingVersionIdDecoded means we actually did not an object already
→ should clarify how we get there, and nice if we can drop this condition
(remember that with CRR with cannot work in non-versioned buckets -so no "overwrites"-, and that versions may be replicated out of order)
There was a problem hiding this comment.
I rewrote some part of the if statement into 2 stages, but your comment still stands.
I am not sure i understand it though, I think you are saying that if we were able to get objMd, then necessarily objMd will be equal to incomingVersionIdDecoded ?
- objMd is obtained through a middleware, from the provided bucket + the key, and it's the latest available version of that bucket
- the x-scal-micro-version-id is anything people using the client decide to put inside. I don't see a situation where we would skip checking objMd.version === incomingVersionId
| replicationInfo: getReplicationInfo(config, | ||
| objectKey, bucketMD, false, size, null, null, authInfo), | ||
| overheadField, | ||
| updateMicroVersionId: true, |
There was a problem hiding this comment.
- should "new" objects not keep having an empty microVersionId (to avoid wasting metadata space)?
and just consider "no micro version id" is the oldest micro versionId (the only thing it cannot compare to is another no micro version id - but that would not happen on new object, each get their own versionId) - in case of un-versioned buckets (i.e. only master objects), we may need to update the micro version id (because we have no version id) ; however there is no replication in that case, so no problem not to do it
|
|
||
| const incomingVersionIdEncoded = request.headers['x-scal-version-id']; | ||
| const decoded = incomingVersionIdEncoded ? decode(incomingVersionIdEncoded) : null; | ||
| const incomingVersionIdDecoded = decoded instanceof Error ? null : decoded; |
There was a problem hiding this comment.
should we not actually fail if the version id cannot be decoded? (this would be a bad request)
There was a problem hiding this comment.
yes actually i changed the single if into 2 ifs :
If the versionId is provided, but we can't decode it, it means something is wrong and we shouldn't continue anything
| function putMetadata(request, response, bucketInfo, objMd, log, callback) { | ||
| const { bucketName, objectKey } = request; | ||
|
|
||
| const encodedMicroVersionId = request.headers['x-scal-micro-version-id']; |
There was a problem hiding this comment.
we probably need to differentiate between
- the absence of microVersionId (ie. x-scal-micro-version-id is not present in the request) → no extra check
- microVersionId is empty (something like
x-scal-micro-version-id: "", when referring to the first metadata revision when object was created) → need the extra check, and check if the object already has newer micro version id (i.e. either it does not have one → "equal", or it does → "newer")
There was a problem hiding this comment.
I change the code a bit to do what you mention in the first point, and also return an error when the header is present but fail to be decoded like in putData.
For the other point,
The current implementation is : new source will always have a microVersionId, it's talked about in another comment at the top of this review, but I don't see how we can properly deal with empty microversion ids on new objects written after this feature is added, without having issues later. It means we would have situation like
A->B->C
I put on A, A has no microVersionID, but replicates to B and C which now have a microVersionID.
But A still doesn't have a microVersionID. What happens if there is a loop configuration and C goes back to A
What happens on the second write on A when a first write has already replicated on B and C ? B and C will have a microVersionID but A won't ?
It seems to me that deciding to not assign a microVersionID on all new putObject writes is gonna get real tricky in the long run
| request.resume(); | ||
| return _respondWithHeaders( | ||
| response, | ||
| { code: MicroVersionIdAlreadyStoredException.name, | ||
| message: 'incoming microVersionId already at destination' }, | ||
| {}, | ||
| log, | ||
| callback, | ||
| 409, | ||
| ); |
There was a problem hiding this comment.
the code is completely duplicated, the only difference is really name of exception and "text" of message.
as discussed in cloudserverclient, do we really need cloudserver to make that distinction? just returning the same error (micro version id conflict) AND the micro version id of the object stored allows the caller to make the distinction, if they need - without duplication.
| request.resume(); | ||
| return _respondWithHeaders( | ||
| response, | ||
| { code: MicroVersionIdAlreadyStoredException.name, | ||
| message: 'incoming microVersionId already at destination' }, | ||
| {}, | ||
| log, | ||
| callback, | ||
| 409, | ||
| ); |
There was a problem hiding this comment.
ahhh, I missed the extra type MicroVersionIdAlreadyStoredException, c.f. https://github.com/scality/cloudserverclient/pull/24/changes#r3496549728
we had the discussion and ended adding microVersionId in StaleMicroVersionIdException, but it is useless if we generate 2 kinds of errors...
→ IMHO we should drop MicroVersionIdAlreadyStoredException, which allows deduplicating here and simplifying backbeat?
otherwise we shoud still dedup, something like:
| request.resume(); | |
| return _respondWithHeaders( | |
| response, | |
| { code: MicroVersionIdAlreadyStoredException.name, | |
| message: 'incoming microVersionId already at destination' }, | |
| {}, | |
| log, | |
| callback, | |
| 409, | |
| ); | |
| if (cmp !=== Ordering.NEWER) { | |
| request.resume(); | |
| return _respondWithHeaderCrrConflict( | |
| response, log, callback, | |
| cmp === Ordering.EQUAL ? MicroVersionIdAlreadyStoredException.name : StaleMicroVersionIdException.name, | |
| 'incoming microVersionId already at destination', | |
| ); | |
| } |
| nextReplInfo.backends = nextReplInfo.backends.filter(b => { | ||
| const loc = config.locationConstraints[b.site]; | ||
| return !loc || !BLOCKED_LOCATION_TYPES.includes(loc.type); | ||
| }); |
There was a problem hiding this comment.
this logic should be in getReplicationInfo, this is the function responsible to build the list of backend/destinations. Doing it there will also avoid create a backend entry only to remove it.
There was a problem hiding this comment.
Are you sure you wanna do this ? It means I have to add some kind of flag to getReplicationInfo to do the current logic in getReplicationInfo 🤔
There was a problem hiding this comment.
Ok I see, maybe it can be done cleanly if, in getReplicationInfo.js I add a helper function, leave the existing function almost as it is and just call that extra helper at the end of getReplicationInfo if the flag is on ?
| // These S3-compatible Scality locations are excluded | ||
| // as cascade targets because they use the MultiBackend S3 path which | ||
| // bypasses the putData/putMetadata routes, so loop detection cannot fire | ||
| // on those destinations. |
There was a problem hiding this comment.
is this part of the design? I may be wrong, but I thought we wanted (needed maybe, to be check with product team) to trigger multi-backend replication after CRR ?
→ please double check
There was a problem hiding this comment.
but yeah i need to triple check it because yes i don't really like this part with these hardcoded locations
51638dd to
afa7f2e
Compare
| // status to PENDING so the queue populator here picks it up for the | ||
| // next hop. If not, clear the source-side replicationInfo fields | ||
| // Always mark isReplica=true. | ||
| if (incomingMicroVersionId) { |
There was a problem hiding this comment.
incomingMicroVersionId is declared with const inside the if (encodedMicroVersionId) block (line 596), which makes it block-scoped to that if block. This reference at line 826 is outside that scope (it's inside the _getRequestPayload callback), so it will throw ReferenceError: incomingMicroVersionId is not defined at runtime.
This affects ALL putMetadata requests through the backbeat route — both with and without cascade headers — and will crash the server process.
Simplest fix: use the function-scoped encodedMicroVersionId here since only a truthiness check is needed.
| if (incomingMicroVersionId) { | |
| if (encodedMicroVersionId) { |
ISSUE : CLDSRV-897
Crr cascaded design : https://github.com/scality/citadel/pull/349
Related PRs :
Arsenal : scality/Arsenal#2628
CloudserverClient : scality/cloudserverclient#24
Backbeat : scality/backbeat#2747
S3utils : scality/s3utils#395