-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upkeep service for aggregator & signer #1791
base: main
Are you sure you want to change the base?
Conversation
Test Results 4 files ±0 51 suites ±0 8m 36s ⏱️ -14s Results for commit c990901. ± Comparison against base commit 4409ebb. This pull request removes 1 and adds 10 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
fd6fc4e
to
bcb059a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about what will happen during the cleanup for the REST API of the aggregator: will it be locked? Can we handle that situation with a 503
response?
pub fn get_sqlite_dir(&self) -> PathBuf { | ||
let store_dir = &self.data_stores_directory; | ||
|
||
if !store_dir.exists() { | ||
if !store_dir.exists() && self.environment == ExecutionEnvironment::Production { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you restrict the creation to production only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match previous behavior.
Previous code from build_sqlite_connection
function in dependency_injection/builder.rs
:
let connection_builder = match self.configuration.environment {
ExecutionEnvironment::Production => ConnectionBuilder::open_file(
&self.configuration.get_sqlite_dir().join(sqlite_file_name),
),
_ if self.configuration.data_stores_directory.to_string_lossy() == ":memory:" => {
ConnectionBuilder::open_memory()
}
_ => ConnectionBuilder::open_file(
&self
.configuration
.data_stores_directory
.join(sqlite_file_name),
),
};
There's a subtle difference between the first (production) and last (Test with file) branch:
- the Production branch call
get_sqlite_dir
that useconfiguration.data_stores_directory
and create the directory if it doens't exist - The Test with file branch use the configuration parameter directly, meaning that it doesn't create the directory.
My memory of why we did that is blurry, I think it was related to integration tests.
I can remove this condition to see if it's still relevant by checking if it all tests runs.
@@ -460,6 +460,8 @@ impl AggregatorRunnerTrait for AggregatorRunner { | |||
} | |||
|
|||
async fn inform_new_epoch(&self, epoch: Epoch) -> StdResult<()> { | |||
self.dependencies.upkeep_service.run().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's a better option to clean just after reopening the signer registration in the try_transition_from_idle_to_ready
function? Also it would be more readable to not embed in the inform_new_epoch
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the upkeep service can lock the main database for a minute (for the vacuum) I thought that the best moment to run it was when the aggregator is at is lowest charge.
Also since it runs once per epoch it made sense to me to put it in the inform_new_epoch
but I can create a new method in the runner if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jpraynaud Currently if the repository layer fails because of an error (most likely a Returning another error code is possible: To do so we need to rework the What do you think ? |
I was thinking about a warp middleware that would return a |
As of now the only upkeep task is a cleanup of the databases by running a vacuum and/or a wal checkpoint to reduce their sizes.
* Allow to open in memory database only in test. * Avoid getting the same path from two different ways, with the only difference being the creation of the dir if it doesn't exist
And when the registration round is closed.
To not block the tokio runtime since tasks such as the vacuum can take some time.
By checking that the tasks are done using the logs. The actual check that the tasks do what they claims is done in the `SqliteCleaner` tests.
To make it use the same conventions as the aggregator. * Rename signer `database::test_utils` `database::test_helper` and move it to a dedicated file. * Copy the same `TestLogger` * Sync aggregator & signer `CardanoTransactionImporter`
Like the aggregator for now the only upkeep task is to cleanup the databases by running a vacuum and/or a wal checkpoint to reduce their sizes.
Executed at the start of the `unregistered > registered` state machine transition. Note: In the integration test the store have been changed from independent `MemoryAdapter` (that use hashmaps) to a `SQLiteAdapter` with a shared in memory connection to make it closer to the production environment.
* Since the UpkeepService now took this cleanup task. * Remove the `Vacuum` option from the ConnectionBuilder as it was the only use. * Remove `vacuum_database` function and instead use `SqliteCleaner` everywhere to streamline usage.
Since that would probably mean that some writing are currently occuring on the database (like if the cardano transactions preloader is running).
bcb059a
to
c990901
Compare
Content
This PR add a new service to
mithril-aggregator
andmithril-signer
, theUpkeepService
:Idle → Ready
state machine transition, while the registration round is close.Unregistered → Registered
state machine transition.pragma wal_checkpoint(TRUNCATE)
on both the main and the cardano transaction database to avoid infinite grow of the wal files.Note: the vacuum is not run on the cardano transaction database as it can be so large that it takes more than an hour on the mainnet.
Pre-submit checklist
Issue(s)
Closes #1707