Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Yeah good luck, not reviewable.
This PR is primarily to separate worker (
DefaultWorker
) from state (NePSState
).runtime.py
.NePSState
and deciding when to stop.SeedSnapshot
,ErrDump
(containing all global worker errors) andOptimizerInfo
.Synced
protocol.Synced[Resource, Location]
.Synced[Trial, Path]
indicated a syncedTrial
which is indentified by aPath
.Synced
provides operations for unique access to resources, ensuring proper locking behaviour and uses a concept of versioning for each individual resource such that aNePSState
does not need to reload everything from scratch each time, if it already has the latest version.Synced
protocol is agnostic of the underlying way in which it's performed, this would allow for example, a fully in-memory NePS or a server hostingNePS
(@vladislavalerievich, this might be something we consider once you're available)For trusting this PR, all tests pass, mypy passes and I've added tests on:
Synced
, essentially letting them be access and mutated in an atomic way across workers.NePSState
is atomic and correct, with every optimizer that is currently indexed inSearcherMapping
.DefaultWorker
respects all stopping criterion and does the loop it's meant toAdditional notes:
Trial
at a time and that there is only ever oneNePSState
they are associated with, there are now two functions:neps.runtime.get_in_progress_trial()
: This gives you the trial which the worker is currently evaluating. This can be useful to inject functionality into therun_pipeline()
of the user, such asload_checkpoint
,tblogger
, etc... theTrial
object has much richer meta information than what is available through just the arguments torun_pipeline(...)
. Calling this outside ofrun_pipeline
context is considered a developer error, and this is not something we directly advertise to users. For example, calling this while inside the context of sampling a new trial from an optimizer is considered an error as there is no trial currently being evaluated.neps.runtime.get_workers_neps_state()
: This gives you the entireNePSState
object the work is operating from. This is set as soon as a worker starts and is available for the duration of the program. Calling this before a worker starts is considered a developer errortblogger
andstatus
, I've adapted them to use the newget_workers_neps_state()
such that they can easily get a reference to theNePSState
the worker is operating under. This reduces the need for them to be explicitly aware of where anything is located and instead ask NePSState for what they need.NePSState
, where proper locking can be done.I'm sure there's plenty more things but they'll come up as they need to