-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump VALIDATION_CODE_BOMB_LIMIT #7710
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Andrei Sandu <[email protected]>
/cmd prdoc --bump patch --audience node_dev |
Signed-off-by: Andrei Sandu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can lead to disputes? When 3/4 + 1 have upgraded and some uploads such a big runtime, other nodes that do not support this will start disputes?
The same concern prevented me from merging it yet. However, It should not. I have looked at the code and this error is handled as deterministic internal error. At worst, the PVF precheck would fail if not enough validators are upgraded. Otherwise if it passed precheck, unupgraded validators will not issue a dispute and just log an error. @s0me0ne-unkn0wn can you confirm the above? |
But, they would still be no-shows and increase the finality lag for everyone, so with 1/4 of validators being no-shows and 500 validators, my back of envelope math tells me that finality lag on average will be around 15 blocks. Rough Explanation:
|
Apart from that it should be 2/3rds + 1, this concern sounds valid. I think the best we can do here is roll this out in two phases to be safe:
Assuming validators are at least following minor version bumps, the time between (1) and (2) can hopefully be rather small, if we backport to all running releases. I guess validators at least following point releases is just wishful thinking though? Alternative hack: Temporarily increase pre-checking threshold to something that makes the worst-case finality lag acceptable. Requires a runtime upgrade, so will also take a while - also the problem stays roughly the same: The bigger limit won't be usable until enough validators upgraded, but it is a bit safer - as the "enough validators upgraded" is enforced by the protocol. |
We never dispute on the basis of pre-checking results. The reasoning is that the pre-checking results are the basis for future disputes by themselves. If the pre-checking has passed successfully, and later a preparation (with lenient timeout) fails, we dispute. If the pre-checking fails, we never enact the PVF in the first place, so there's no reason to dispute here. So, the pre-checking output is strictly binary. Also, FWIW, the note on rejecting PVFs from docs:
|
Or we directly do it via the |
If the concern is finality yes, I agree, the concern is increased finality lag, which we can mitigate by pinging validators to upgrade.
This approach is faster than So, the options are, ordered by the how fast we fix the problem:
What we choose depends on how soon we need this to be fixed in production which depends on the impact. Are teams already affected or we know that soon they will be? |
The |
I don't think this will happen fast because it requires a stable release, as it is a major bump due to the |
Then just implement this in a backwards compatible way. |
I cannot find any way to change that struct without needing or adding more code bloat to handle a separate storage value + active / pending per session tracking of the parameter change. So I'm proposing the 4th option:
Doing this will result in a limit of 30MB based on current configuration, and is also future proof, because if we increase the code size the bomb limit also increases. This will be a minor semver bump so we won't break anything and we can do a quick release. On the node side however it could be trickier with the semver, but we can manage that as only our node uses the affected crates. |
It used to be 16MiB some time ago. Bumping to 32MiB sounds sensible to me, as we've seen some recent runtimes expanding close to the previous 16MiB.
This is just a short term fix for #7664 - needs to be backported and deployed ASAP.
While this unblocks assethub, I will create another PR to promote it into a HostConfiguration parameter and associated runtime api.