-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Merge retry logic #2719
fix: Merge retry logic #2719
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2719 +/- ##
===========================================
+ Coverage 78.00% 78.13% +0.13%
===========================================
Files 310 310
Lines 23236 23188 -48
===========================================
- Hits 18124 18116 -8
+ Misses 3725 3693 -32
+ Partials 1387 1379 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
pushLogEmitter event.Emitter | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: If you remove the docQueue
, it opens the door for transaction conflicts on the dag sync process. One of the test matrix run failed for this exact reason. I suggest you keep it in :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still there but in the db
package instead. I think it makes more sense to have the blocking occur there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if the conflict happens in exactly the wrong way, blocks will be missing from the blockstore. I wanted to remove it when I refactored the dag sync process and I found out that we have to keep it in. Alternatively we need to change the dags sync logic to deal with conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right I remember now. Would a simple fix be to ignore transaction conflict errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetBlocks
doesn't handle errors very well. Even GetBlock
doesn't return the block on put
conflict so we would need to retry. It would be an efficien retry because the block would be found locally on the next try. I'll let you decide what to do from here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated sync process should now handle conflicts.
|
||
var err error | ||
// retry merge up to max txn retries | ||
for i := 0; i < db.MaxTxnRetries(); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: If you use the merge queue, I expect that a retry will never be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user updates a doc while a merge is in progress it could still happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 very true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user updates a doc while a merge is in progress it could still happen.
I'd be super nice / reminder to document that here.
@@ -43,28 +45,37 @@ func (db *db) handleMerges(ctx context.Context, merges events.Subscription[event | |||
return | |||
} | |||
go func() { | |||
err := db.executeMerge(ctx, merge) | |||
// ensure only one merge per docID | |||
queue.add(merge.DocID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: If the DocID is now part of the message, we can get rid of the getDocIDFromBlock
function. The reason I hadn't included DocID before is that I wanted to guarantee that there was no mistake with it. Like the wrong DocID for the provided CID. That was probably a bad reason when I think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Too bad we are reintroducing ipld-format after trying to get rid of it. We can switch over to an ipld-prime solution in the future.
Relevant issue(s)
Resolves #2718
Resolves #2721
Description
This PR fixes issues with merge retry and DAG sync processes. It also moves the
docQueue
from thenet
package into thedb
package.Tasks
How has this been tested?
make test
Specify the platform(s) on which this was tested: