-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate TaskManager into NodeGraph and Discovery #445
Conversation
👇 Click on the image for a new way to code review
Legend |
d969836
to
4e3da17
Compare
4e3da17
to
c3bc60e
Compare
Looking at the |
What does this function do again?
|
That function does the initial network entry procedure for |
@tegefaulkes during integration into node graph. I'd like to ensure that we have understood all the problems that the nodegraph has such as: |
Then we redeploy testnet and focus on #441. |
Discovery has a bug according to vscode:
It has never had a proper review, so it's architecture should be reviewed and probably refactored to align to the model we have developed in the Tasks system, since it's also something that has background tasks. In fact... now that the tasks system centralises background task processing, it's possible that we can have discovery and nodegraph both delegate repeat-processing to the tasks system, and remove their own internal loops, thus simplifying how discovery and nodegraph works!! |
24e2cb4
to
63fde43
Compare
…eparate from the overall timer other fixes have been applied.
…y idempotent the @ready decorator caused them to throw if ran while `taskManager` was not running. They needed to be called during incomplete startup, so I removed the decorator.
65b8c6f
to
b29ba9e
Compare
Ok, this is good to merge now. There are 3 test domains that need an eyeball still.
|
2c04717
to
806bb09
Compare
7073654
to
23f470b
Compare
Description
This PR focuses on updating
Nodes
,Discovery
domain to use the newTasks
system.Generally there are 3 places where background queues are being used. Each of these need to be updated to use the new tasks system from #438.
NodeConnectionManager
Does the network entry procedure withsyncNodeGraph
.SetNode details
In
NodeConnectionManager
when adding a node to theNodeGraph
withnodeManager.setNode
we can end up with the case where a bucket is full. When this happens we need to ping nodes within the bucket to determine if they're still alive and remove any nodes that don't respond so we can add the new one. We need to convert this to use the scheduler/queue.By default the
nodeManager.setNode
doesn't ping a node to check if it's online before adding it. It is expected that you ping the node before usingsetNode
to add it. We add nodes whenever we interact or discover a node. This can happen in the cases of... We learn about a node from other nodes, the a connection to the learned node hasn't been made so it needs to be pinged. A node has connected to us so it just needs to be added. We connected to a node, details need to be updated. Given that we needed the ability to ping andsetNode
or justsetNode
.RefreshBuckets details
nodeManager.refreshBucket
needs to updated to use the new scheduler/queue system. In this case we will be making use of theScheduler
features. A refresh bucket operation needs to be run on a bucket if the bucket hasn't seen activity for an hour. Given this we need schedule each bucket for an hour delay. If a bucket is updated we need to reset the timer for the bucket. To do this we need to make use of thetaskPath
and timer updating features of theScheduler
. TheserefreshBucket
tasks should be the lowest priority.Having
refreshBuckets
system use theTasks
system is a little complex since it works on a kind of watchdog system. here are the requirements of therefreshBucket
systemrefreshBucket
operation selects a randomNodeId
within the target bucket's range of nodes and preforms a search for that node.refreshBucket
operation every hour.here are some relevant constraints of the tasks system.
scheduled
,queued
,active
,success
orfailure
.scheduled
state.path
, We can track a task for a single bucket using a path such as['refreshBucket', bucketIndex]
. This can enable us to find existing tasks for buckets.Given these constraints I think we need to do the following.
refreshBucket
tasks using thetasks.getTasksByPath
, reset the delay on existing tasks and create ones for buckets missing them.nodeManager.setNode
then we need to update the delay of that refresh bucket task. This can be done by getting the task for the bucket usingtasks.getTasksbyPath
and updating the scheduled delay if the task is in thescheduled
state. If it isqueued
orActive
then ignore, if not task exist create one. The tasks's delay can be updated by either canceling it and re-creating or using a providedupdateDelay
method.refreshBuckets
tasks duringnodeManager.stop()
. Minor detail, won't really make a difference to operation.NodeConenctionManager details
In the
NodeConnectionManager
we have a methodsyncNodeGraph
that does the following proceedure. IfsyncNodeGrap
is run with blocking = false then we need to run it in the background.NodeId
. Then go through this list pinging and adding their details to our nodeGraph.refreshBucket
operation.Looking over this I think it should be part of
NodeManager
and making calls toNodeConnectionManager
for pinging.As for how to implement this using
Tasks
.pingAndAdd
tasks for each of the closest nodes.refreshBucket
operation for every bucket above the closest node's bucket.Other details
Other aspects to consider. The Kademlia
findNode
operation in theNodeConnectionManager
is considered a single operation but is in essence a priority queue search for the target node. We can consider splitting this up into a compound task where each step in the search can be it's own task within the process. This would apply to therefreshBucket
operation since that is doing afindNode
as well.Handlers will need to support cancel-ability. They must take an abort signal and quickly end operation.
Issues Fixed
Starting Connection Forward
infinite loop #413NodeConnectionManager
) #353 - This will be a separate PR after this one. After we address updatingDiscovery
to useTaskManager
.Tasks
TaskManager
toPolykeyAgent
4. UpdateDiscovery
domain to use the new tasks system.6. Update and checkDiscovery
tests.7. Address issue Reduce the timeout for establishing a Node Connection within the Discovery domain (by adding timer override toNodeConnectionManager
) #35310. Address issue Connection dropped/timed out when connecting to deployed agent #414- Needs more testing, can't address this now. I've added Process exit handler for tracking unresolved asynchronous work ("promise deadlocks"/"why is node terminating for no reason?") #307 to help with part of this.Starting Connection Forward
infinite loop #413setNode
garbage collection Integrate TaskManager into NodeGraph and Discovery #445 (comment)Final checklist