Introduction of a Validator Score #21

oliviasaa · 2025-07-31T11:02:06Z

oliviasaa
Jul 31, 2025
Collaborator

Motivation

Validators are the backbone of the IOTA network. The performance of each validator directly influences the network’s overall efficiency and usability. Therefore, it is essential to incentivize validators to operate reliably and effectively by tying their rewards to their observed performance.

Currently, the only mechanism for penalizing underperforming validators is a manual reporting system. This system requires a quorum of validators to explicitly report a misbehaving peer in order to reduce their rewards. Such an approach is impractical, as it demands continuous manual monitoring, which is an unreasonable burden for most operators. Moreover, no standard criteria exist to guide reporting decisions, resulting in inconsistent and arbitrary thresholds set independently by each validator.

We propose an automated and standardized system for monitoring validator behavior, culminating in a commonly agreed score that reflects each validator’s performance during an epoch. These scores would directly influence the rewards distributed at the epoch’s end.

Specification

Performance Metrics, Proofs, And Partial Scores

Each validator will monitor its peers throughout an epoch, collecting performance metrics and computing a local partial score for every other validator. Regardless of the exact set of metrics used, they are divided into two categories: provable and unprovable metrics:

Unprovable metrics: These represent misbehaviours for which a validator cannot produce a proof. Examples include malformed blocks or the dissemination of invalid blocks. Based on these metrics, validators calculate local partial scores using a predefined scoring function. The exact formula is beyond the scope of this IIP, as it is expected to evolve over time based on observed network behavior and changing validator requirements.
Provable metrics: These include signed but invalid blocks and equivocations. Validators should produce proofs of these behaviours throughout the epoch and disseminate them using the misbehavior_reports fields of their newly created blocks. Whenever a block containing a MisbehaviorReport is committed, validators update their count for the corresponding misbehaviour.

Aggregating Scores

At the end of each epoch, validators broadcast their locally computed partial scores for all other validators. This is done by replacing the current EndOfPublish transaction with a new EndOfPublishV2 format that contains the computed vector of partial scores.

Once a quorum of EndOfPublishV2 transactions has been collected, the epoch can advance. The stake-weighted median of the partial scores submitted via the EndOfPublishV2 transactions is then taken and combined with the provable metric counts (based on the proofs), resulting in a deterministic aggregated score.

Adjusting Rewards

The aggregated scores are passed to the advance_epoch function, where they are used to adjust validator rewards using the following formula:

adjusted_rewards[i] = unadjusted_rewards[i] * aggregated_scores[i] / max_score

Where max_score is the maximum achievable score. As in the current protocol, the difference between adjusted and unadjusted rewards is burned.

Rationale

Performance Metrics

The categorization of metrics as provable or unprovable allows them to be treated differently. Unprovable metrics are highly gameable and should not lead to severe penalties. Provable metrics, if correctly designed, offer a reliable estimation of validator performance and potential malicious behaviour, and can therefore be used as part of a strong incentivization mechanism.

Since proofs are embedded in committed blocks, validators already have a common view of all provable metrics. Thus, there is no need to report any count or score relative to provable metrics at the epoch end.

Unprovable metrics, on the other hand, remain entirely local to each validator. Therefore, they must be shared and agreed upon through the consensus mechanism.

Aggregating Scores

The use of the EndOfPublish transaction to collect local scores is ideal, as it is the final message sent by each validator before closing the epoch. This timing ensures that the latest score data is captured for reward adjustment at the epoch’s end, without delays on the epoch advancement.

Additionally, by building on an already existing mechanism that requires a quorum of validators, we ensure that the final score used for reward adjustment is based on a quorum’s local scores.

The choice of a stake weighted median to aggregate local scores is both natural and necessary. It ensures that validators cannot manipulate the outcome by splitting their stake across multiple identities, as each validator’s influence is proportional to their stake. Additionally, the median is a robust statistical measure that resists distortion from extreme values, unlike the mean, which can be skewed by outliers or intentionally biased scores.

Adjusting Rewards

Once consensus is reached on the aggregated score, adjusting rewards becomes straightforward. The aggregated score incorporates all locally computed metrics and is designed to ensure incentive compatibility. As such, it can be directly applied as a multiplicative factor to the unadjusted rewards.

Reference Implementation

An initial set of metrics has already been implemented in the iota repository, along with a simple scoring function that serves as a placeholder for a more complete version. This reference implementation is available in the (already merged) PR#7604 and PR#7921 The remaining components required to achieve consensus on an aggregated score and to adjust rewards do not yet have a reference implementation.

Backwards Compatibility

The version change in the EndOfPublish message is not backward compatible and must be implemented as a protocol upgrade, with a feature flag to enable the new functionality. All other changes are either local to the node (as storing and counting metrics) or use already existing types and fields (as the misbehavior_reports field in the block or the MisbehaviorReport
type). Those local changes should not cause any node behaviour or agreement problems.

dlt-green · 2025-07-31T15:25:20Z

dlt-green
Jul 31, 2025

Fundamentally, we advocate some form of automated performance evaluation.

Security / Maintenance:
However, the security aspect is also important here.
Validators should not be forced to stop maintaining their systems due to 100% uptime. A certain maintenance window per epoch should be permitted without immediately losing rewards.

Bugs:
What happens to a validator if an error that cannot be attributed to them causes them to fail (this has already happened with one validator in the past / was fixed with v1.3.1)?

5 replies

oliviasaa Aug 3, 2025
Collaborator Author

I completely agree with you.

About Security / Maintenance:

It's unreasonable to expect validators to have a 100% uptime. We should not punish validators for this; it is just unfair.

The actual scoring function to be used was left out of this proposal exactly for this reason. We'll soon have the metrics proposed in the PR (invalid blocks, equivocation count and missing proposals) available to be collected on Mainnet. With this information and also with the feedback from the current participating validators, we can fine tune a reasonable scoring function, that incentivizes performance without decreasing the current distributed rewards for no reason.

A validator using the default protocol and respecting the software and network requirements that were defined by the Foundation should get 100% of the rewards. There is no reason to design a scoring function in a different way than that.

The current plan (which is a quite straightforward solution) is to set some allowed thresholds to reflect the expected behavior. For example, if it is expected that validators will spend (let's say) 5% of the time offline, they will obviously fail to propose 5% of the blocks. We can then design the scoring mechanism so that it punishes the missing proposals above this 5% threshold. We are naturally open to suggestions.

About bugs

Some failures are, indeed, not the validators' fault. It's also unfair to punish misbehaviors caused by bugs in the default software, and we do not plan to do this.

This was left out of this proposal because any solution to this type of unfair punishment cannot be automatic. The scoring mechanism doesn't know if there is a bug or not, so validators suffering from this will be naturally scored poorly by any automatic scoring mechanism.

What is currently being internally discussed is to add some kind of ad-hoc "forgiveness" mechanism, where the reward slashing caused by software bugs could be compensated if a quorum of the committee agrees to do so. This decision (to compensate or not the slashed rewards based on the committee's opinion about this) would be part of the protocol, going through the consensus mechanism etc. However, the votes about the potential decisions to be taken have to be part of a manual process.

It can be quite similar to our current reporting system, where validators can manually state that someone else is misbehaving, and then the protocol uses these reports to distribute rewards.

But, again, we are open to suggestions :)

tokenlabs-network Aug 3, 2025

An observation about the system architecture:

On voting mechanisms:
It's interesting how the proposal maintains the manual voting element from the current system, but restructures it. Instead of voting to report problematic behavior, we would now vote to forgive automatic penalties.

The architectural question:
I wonder if this represents a fundamental improvement or simply a reorganization of the same human decision-making process. In both cases, the community must coordinate, evaluate evidence, and make a collective decision about a peer's rewards.

On operational efficiency:
If we maintain a voting mechanism for the most common cases (software bugs), are we actually reducing the operational burden, or simply changing when and why we vote?

Have you considered approaches that could minimize the need for these exception mechanisms from the initial design?

Edit: We support the automation goal and want what's best for the protocol - hoping these considerations help strengthen the design.

oliviasaa Aug 3, 2025
Collaborator Author

Correct me if I'm wrong, but shouldn't bugs be the least common causes for punishment?

oliviasaa Aug 3, 2025
Collaborator Author

Or something that we could at least expect to decrease its frequency with time?

tokenlabs-network Aug 3, 2025

You're right @oliviasaa.

Reading your latest responses, this approach makes much more sense - especially with the grace period where scores are visible but don't affect rewards initially, and the dry run concept to test everything before activation.

sdellava · 2025-07-31T18:51:30Z

sdellava
Jul 31, 2025

I agree too, but I guess this system introduce additional risks for honest validators that can be penalized for a simple system failure.

To minimize this risk I suggest to take this improvement as opportunity to introduce an automatic validator fail over.

As high level overview, this feature allows a node to be active as "backup validator".

Ideally, two validatiors should partecipate to the committee controlling the same stake but only one of them partecipate to the consensus.

If the active fails, the backup becomes active.

This feature double the validator costs but cuts the failure risks, so let the automated penalty system to be acceptable.

3 replies

dlt-green Jul 31, 2025

You don't need two validators; you just need to adjust the consensus part of your database settings and run a full node on a second server. In the event of an error, simply switch to the second server and start with the validator configuration. We managed this in the testnet across data centers within 6 minutes. This scenario represents the worst case scenario, assuming a highly available cluster with RAID and industry-level components. You don't need a second validator online at the same time.

dlt-green Jul 31, 2025

I see more the problem with patchmanagement / maintenance. When 100% uptime = 100% rewards then no one will make some maintenance any more. so also the history of a validator is Important to maybe get free time slots without penalty.

oliviasaa Aug 3, 2025
Collaborator Author

I wrote a little bit about your maintenance concerns above. I completely agree with you, there are failures that are not really "failures"; they are just expected behavior and should be treated as such, with no penalizations.

Quilamir · 2025-07-31T19:56:17Z

Quilamir
Jul 31, 2025

Agreed that an automated system is required.

The system described here if implemented will need to be self checked, meaning that the partial scores will need to conform to some degree, if one validator gives a poor score to another while all the rest give it a high score then something went wrong with the reporter.

What if consensus cannot be reached on the partial scoring mechanism?

It will also need to be tamper proof on the side of the reporter or this might give an incentive to "lie" on the report.

In general an automated system should probably rely on onchain data, more than active monitoring by several peers, how possible that is in Iota I am not sure yet, but you can see good example of this on Ethereum with Rated even though their scoring does not affect rewards.

Cant we rely on onchain data with a delayed effect? meaning if your calculated score from onchain data is low then you get less rewards on the following epoch, not the current one.

2 replies

dlt-green Aug 1, 2025

The following metric could be used for automatic evaluation:

consensus_last_decided_leader_round - consensus_last_committed_authority_round

You then also need a tamper-proof history in the active epoch over time
based on this score divided into 3 ranges (ok / warn / err) a final score is build

OK <10: 1
WARN < 9999 rounds (equals ~10min): 1
ERR: >=10000: reduces rewards step by step
(Aggregated by 5 minutes intervals)

so there would be a possible time frame for maintennace, whats important for network security.

the best is, everyone can look at his own validator factor already on our homepage: https://dlt.green/en/services/iota-staking-analytics

(but only an idea - we do not know whether this is feasible at the protocol level)

only a random example:

oliviasaa Aug 3, 2025
Collaborator Author

It's nice to see that people agree on the automatic system need! Just a few comments:

The system described here if implemented will need to be self checked, meaning that the partial scores will need to conform to some degree, if one validator gives a poor score to another while all the rest give it a high score then something went wrong with the reporter.

I personally like this way of thinking (comparing evaluations given by validators and judging if they are reasonable). This was actually part of our initial discussions, and I find it really conceptually interesting. I even considered punishing "liars" this way. However, we found a practical problem in this approach: a malicious validator can not only lie about unprovable metrics, but also make honest validators look like they are lying.

For example, suppose validator A is honest and B is malicious. Let's consider 2 scenarios:

B lying and saying A's partial score is 0.
B spamming A with malformed blocks.

We have to make sure A is not punished in any of those situations. A will not be punished in the first situation because A's score will depend on the median of the partial scores given to him. Since we always assume that less than 1/3 of the stake is malicious, A will have a "full" score, since the honest validators will score A properly.

The problem is in the second situation. What if B only spammed A? A's opinion about B will deviate from the other validators' opinions about B, and we shouldn't neither punish A for that nor disconsider this opinion... This is a tricky problem.

What if consensus cannot be reached on the partial scoring mechanism?

Validators are expected to have different views of the partial scores. That's why we use the median of them.

In general an automated system should probably rely on onchain data, more than active monitoring by several peers, how possible that is in Iota I am not sure yet, but you can see good example of this on Ethereum with Rated even though their scoring does not affect rewards.

I'd like that, but most of the data that should be used to "judge" validators never make it outside of the consensus mechanism. They are just not visible from an outsider (meaning someone outside of the committee).

starfish-one · 2025-08-01T07:49:50Z

starfish-one
Aug 1, 2025

Thanks for the proposal—automated scoring is a natural next step, but must be carefully aligned with IOTA’s validator cooperative model and decentralization goals.

⸻

IOTA is not leader-based—coordination matters
In IOTA 2.0, validators cooperate to form certificates. It’s not a race to be fastest. This cooperative setup fundamentally changes the risk/reward calculus for performance penalties.

⸻

Agree with earlier concerns—on-chain data should be the foundation
Metrics like round lag are not on-chain, and are inherently subjective. Any scoring using such metrics must aggregate across perspectives—nontrivial and potentially contentious.
We recommend focusing first on metrics that are objectively observable on-chain.

⸻

Is there a real problem?
So far, the network is stable. If scoring has no measurable effect, it’s unnecessary. If it does, we risk:
• Skewed APYs
• Optimization races without knowledge sharing
• Higher barriers for new validators

Let’s avoid solutions looking for a problem.

⸻

Rewards ≠ Participation
Poor rewards don’t stop a faulty node from breaking quorum. The real issue is voting power.
We must ensure that non-functional nodes are excluded from consensus—not just punished economically.

⸻

Centralization pressure via silent optimization
Unequal rewards create incentives to optimize infrastructure—but without transparency. This risks a validator class divide between insiders and everyone else.

⸻

Did you backtest this?
Any scoring proposal should include historical simulations.
• Which validators would be penalized?
• Would outliers distort the scores?
• Are scores consistent across epochs?

Without data, this remains speculative.

⸻

Proposal: Inform before enforcing
Introduce transparent scoring dashboards first. Let everyone monitor behavior. Then evaluate impact. Only later—if justified—should rewards be adjusted.

⸻

We support performance visibility. But penalties must come last—not first. Let’s build trust with data, not assumptions.

4 replies

dlt-green Aug 1, 2025

We agree with your point 3.

Looking at the past (our analytics), we can say that from observation all validators are trying to keep their service running, some of it was due to third-party fault, some of it was perhaps a learning effect. We can definitely rule out intentional misconduct. (everyone can check the status quo with our analytics site over the last epochs also with history).

From this point of view, it may not be necessary to introduce this at the moment, on the other hand, we do not know what will happen if, for example, there are 80 validators in the committee. (the risk will increase)

In principle, we should already be discussing what would happen if we had no recourse if a problem really did occur.

Receiving rewards even though you don't participate in consensus is not right, although you could also say that the community can control this with their stake.

Too many rules might also scare off new validators. Perhaps we could learn a bit from other chains on this topic, which is ultimately the best approach.

dlt-green Aug 1, 2025

point 2.

If all validators report the lag from everyone (and we're not talking about a little lag here, but an extraordinarily high amount of lag), and a value is calculated from this, this can be used as a measuring instrument. (we get the values out directly from the validator metrics (consensus_last_decided_leader_round - consensus_last_committed_authority_round))

dlt-green Aug 1, 2025

point 7.
we would have for this already an analytics page (what we have worked over some month on it to make it better), and a discord bot.

oliviasaa Aug 3, 2025
Collaborator Author

@starfish-one

thanks for the feedback!

First, I'm assuming that by on-chain you mean in the DAG, right?

As I mentioned in the other answers, the actual metrics to be used in the scoring are still being discussed, and your suggestions are all valid and will be taken into consideration! You are even welcome to suggest more :D

About topic 4

Rewards ≠ Participation
Poor rewards don’t stop a faulty node from breaking quorum. The real issue is voting power.
We must ensure that non-functional nodes are excluded from consensus—not just punished economically.

I completely agree with that. But this is a completely different proposal, which is also being internally discussed. Just a comment (maybe containing a "spoiler"): I don't see the possibility of automatically excluding nodes from consensus based on the performance on a single epoch. What if there is a software bug that affects a considerable part of the validators at a certain epoch? We can't just automatically exclude them from consensus, this could harm security even more. I'd say exclusion from consensus has to be based on historical metrics (meaning across several epochs), and even on some "manual" reporting mechanism. But the problem with introducing a manual reporting for such an important part of the protocol is that it requires a quite extensive game theoretical analysis, imo.

About topic 6

Did you backtest this?
Any scoring proposal should include historical simulations.
• Which validators would be penalized?
• Would outliers distort the scores?
• Are scores consistent across epochs?
Without data, this remains speculative.

I agree that any scoring function proposal should be based on data. That's why the PRs linked in the description introduce a set of metrics to be initially exposed by validators, so we can properly design this function.

However, we are not proposing any specific scoring function. Thus, there is nothing to backtest. Here, we only propose a mechanism.

Finally about topic 7:

Proposal: Inform before enforcing
Introduce transparent scoring dashboards first. Let everyone monitor behavior. Then evaluate impact. Only later—if justified—should rewards be adjusted.

That's the plan! :D

tokenlabs-network · 2025-08-02T09:22:00Z

tokenlabs-network
Aug 2, 2025

This is a fantastic discussion with many valid points. We strongly agree with the core sentiment expressed by @starfish-one and the practical, operator-focused considerations from @dlt-green

A clear consensus seems to be forming: radical transparency and empowering the community should be the priority, and protocol-level penalties are a serious measure that should be a last resort.

Building on this, we'd like to propose a concrete framework that puts this philosophy into action.

A Proposal for a Community-Driven Performance Framework
We propose a two-part framework focused on information and collaboration.

A Standardized "Quality Certificate" in the Official Explorer

The next logical step is to standardize the rich data from community tools (like the excellent dashboards from @dlt-green) into a simple, public-facing Health Score. This would be displayed as an intuitive letter grade (A-F) with color-coding in the official IOTA Explorer. The score should represent a monthly average, with a daily drill-down for full transparency.

To show our commitment to this idea, we at Tokenlabs have already started building a proof-of-concept for this "Quality Certificate" dashboard. We intend for this to be a true community effort from day one.
Therefore, we invite any other validators who believe in this vision of transparency to contact us. We will open up a collaborative repository to build this public good together.

An Official Validator Knowledge Base

To further foster collaboration over competition, we propose that a dedicated section be added to the official IOTA documentation to act as a shared Knowledge Base, detailing solutions for common validator issues and helping everyone achieve an 'A' rating.

The community, armed with these tools, becomes the ultimate regulator. Let's build this framework of transparency and support first.

Best regards, The Tokenlabs Team

11 replies

starfish-one Aug 4, 2025

@tokenlabs-network

Agreed. This is the right direction.

Good documentation that helps solve concrete issues is essential for making the system more robust.
Clear, concise guides on improving validator stability would be especially useful—particularly for operators who aren’t (yet) full-time devs.

tokenlabs-network Aug 4, 2025

Totally agree now @oliviasaa 🫡, and I agree with @starfish-one we need more robust documentation in this regard to give everyone a chance to have excellent performance

oliviasaa Aug 4, 2025
Collaborator Author

I'll be quite direct about the documentation topic: I have no idea what exactly you want written there. I'm not involved with this part of the docs, and it could be the case that the person currently taking care of it also doesn't know what is important to you.

Have you talked to anybody about this topic?

oliviasaa Aug 4, 2025
Collaborator Author

I'm sorry, I might have misunderstood this. Do you mean the IF's documentation about validators? Or something done by the community?

tokenlabs-network Aug 4, 2025

Hi @oliviasaa, we're referring to creating documentation based on problems other validators have encountered. If there's any record of common errors and solutions, we can handle processing that information and help create documentation that could be community-driven

dlt-green · 2025-08-03T12:46:51Z

dlt-green
Aug 3, 2025

@oliviasaa We like your approach to this issue and would support it in principle. However, we would like to emphasize again that we should also consider eliminating the validator's commission, not the staker rewards (in first time). It will depend on the details, but we would fundamentally support this approach. In our case, this will also be decided via a governance vote in the future, but initial feedback on this is positive.

We also find the possibility of deciding on thresholds together very positive. Initially, this factor, which we will also integrate into our analytics, should first run in a dry run on the mainnet, so to speak, so that any necessary adjustments can be made before the activation.

3 replies

oliviasaa Aug 3, 2025
Collaborator Author

Sounds perfect to me!

dlt-green Aug 3, 2025

As a side note: We will continue to include the performance factor (also in the dry-run) in our analytics and track it historically. It will be publicly accessible to everyone so that we can then work together to determine the final setting.

oliviasaa Aug 3, 2025
Collaborator Author

That would be great! Thank you so much

starfish-one · 2025-08-04T09:30:35Z

starfish-one
Aug 4, 2025

Great discussion. Let me briefly summarize the direction that seems to emerge:

documentation and knowledge sharing.
The most effective way to improve validator performance and resilience is to provide clear, technical documentation and real-world troubleshooting guides. This empowers all operators and raises the baseline without enforcement.
phased reward slashing
If further incentives are needed, the path should be gradual and strictly economic:
– First, reduce the validator’s commission in case of underperformance.
– Later, reduce delegator rewards as well, but only after a grace period that reflects staking latency. (As a side note: we still have a significant stake sitting on two inactive validators.)
Use on-chain data where possible.
Metrics based on observable protocol behavior are preferred. Subjective or manipulable scores should be treated with caution. Aggregation via stake-weighted median is a sound choice—assuming that no supermajority is tightly connected and the chosen metrics are sensitive to network position.
For example, if metrics are color-based or rely on message delays, validators at a distance from the dominant cluster might be penalized. Testnet and mainnet have different stake distributions—so this must be dry-run and validated on mainnet conditions.
There are already multiple metric proposals, consensus deviation from @dlt-green and "Quality Certificate" from @tokenlabs-network, including those from @oliviasaa—such as equivocations, invalid blocks, and missed proposals. Some of these are provable on-chain, others may require threshold tuning (e.g., missed proposals vs. latency).
It would be great to work on this collaboratively.

33 replies

dlt-green Aug 5, 2025

@tokenlabs-network

As we said, it is simply our opinion (we don't have to agree) that the staker should not be put on the same level as the validator, because we believe that the network must be technically sophisticated enough that a staker can assume a certain level of basic trust, of course for a limited period of time. As mentioned, there will be differing opinions on this, but that's acceptable. We haven't experienced any attacks so far and believe we could actively combat an attack, with a quick ban in the worst case. As a validator, we are available 24/7 and receive rewards, among other things.

Those who haven't met this standard couldn't do anything about it, and we view it critically, because by the time this is resolved, the epoch is over and the validator is punished. We view it even more critically that the staker then receives no rewards either. Of course, the staker has to decide based on trust/reputation/fees/utility, to the best of their knowledge. Of course, if you use low-commission or privately operated validators, the risk increases because there is no real liability.

Nevertheless, it must be possible to inform the stakers (we already do this with a bot, for example, which also links directly to our website). But we believe there must be a possible time frame between the event and the punishment of the stakers. And since the protocol also has its limitations, we would have taken the middle path from a security perspective (including with regard to the quorum) and from the perspective of the stakers, who are not reachable 24/7, and we would have delayed the process for the stakers by an epoch with phase 2.

tokenlabs-network Aug 5, 2025

@dlt-green

We are 100% aligned with your 'staker-first' principle. We also agree that removing a validator should be an extreme measure for exceptional cases, not a normal behavior of the network. The final punishment mechanism, as we've discussed, will be defined in the subsequent IIP.

The core of the solution, as we see it, lies in the forgiveness mechanism. The forgiveness function removes the current 'warning', but if the validator continues to have issues, the system will flag it again. At that point, the community could grant another forgiveness by consensus if deemed necessary.

To address the critical point about an attack at the end of an epoch, the grace period must be, by design, always the current epoch + the entire next epoch. This ensures there is always real maneuvering time, even in the worst-case scenario.

When the system triggers this grace period, it would emit a public on-chain event. This is key, as it would allow all of us to use that signal for our alert systems and to notify the community.

We must insist that this system is for extreme cases. If it's not, we risk creating false alarms, even about good validators, which could cause community panic and unnecessary fund movements. We must be extremely careful and perform exhaustive 'dry run' testing.

dlt-green Aug 5, 2025

@tokenlabs-network agreed 👌 with the dry-run in e.g. the testnet we would support from the beginnung.

oliviasaa Aug 11, 2025
Collaborator Author

@tokenlabs-network hit the nail on the head here:

PD: Regarding penalizing validators but not stakers - after seeing your comments @oliviasaa and @alexsporn, we've finally realized it doesn't make sense, since voting power is directly related to rewards. It would require quite a significant modification to the protocol core to provide 24-48 hours to stakers. We advocate for staker protection above all, but we need to see if it's viable and doesn't add too much technical complexity to the protocol. If later we see it's possible in some way, we would vote yes for staker protection without hesitation.

This is a significant modification to the protocol for a 2 day period. We'd have to release new protocol versions just for that, and I don't think this is practical. Furthermore, the stakers' APY is not even going to be that affected by 2 days of rewards.
So, I get your worries, but I frankly don't see these plans going forward for those practical reasons. 😕

dlt-green Aug 11, 2025

@oliviasaa We also believe that the time period of an epoch should not be changed. This has become established by now, and in our opinion, it shouldn't be changed. Our suggestion would be to see what the final parameters look like, and if maintenance and time periods are well defined, this should be resolvable. If the mechanism is set too strictly, it will only cause unrest in the network. However, we assume that everyone here wants the best for the network (including the stakers). Is it not possible to present a final draft that can be reflected in the consensus based on the discussions here and your ideas? In other words, something that is feasible from your perspective? The fact is, we need a protective mechanism.

starfish-one · 2025-08-04T09:48:53Z

starfish-one
Aug 4, 2025

Comment on reducing the voting power of faulty nodes.

Currently, the protocol assumes static stake availability: if more than 1/3 of stake goes offline, the network halts. This is a well-known limitation of BFT-style protocols, and a significant body of research explores dynamic availability—where voting power is adjusted on the fly based on live participation.

However, dynamically adjusting voting power always comes at the cost of safety. In a partitioned network, each partition might adjust independently and resume consensus, leading to forks.

Still, the fact remains: if fewer than 66% of validators are online, consensus halts. It’s therefore crucial to have a path for removing voting power from persistently offline validators—not immediately, but based on observable and stable conditions.

We propose a simple rule:

If a validator has been offline for two consecutive epochs, and more than 87.5% of the total stake participated during these epochs (the same threshold used for protocol upgrades), then its voting power is removed.

This strikes a balance:
– It avoids reducing power too aggressively in case of global issues (e.g., a protocol bug or voluntary halt).
– It ensures that inactive minority stake can not accumulate and eventually halt the network.

8 replies

tokenlabs-network Aug 4, 2025

I think starfish-one means: if >87.5% of stake participated during the 2 epochs when a validator was offline, it proves the network was healthy. So that offline validator should lose voting power since their absence wasn't due to network-wide issues.

alexsporn Aug 4, 2025
Collaborator

As @oliviasaa wrote before, we are thinking about keeping the historic performance scores for each validator, so this kind of options will be possible in the future. Although I think that should be its own IIP altogether.

dlt-green Aug 4, 2025

As @oliviasaa wrote before, we are thinking about keeping the historic performance scores for each validator, so this kind of options will be possible in the future. Although I think that should be its own IIP altogether.

These ideas are good and would also help with the evaluation, so that new, malicious validators are assessed differently than those with a high trust level who occasionally have a problem. (and their stakers)

starfish-one Aug 11, 2025

The 87.5% threshold is there to ensure that we’re not removing voting power when an event affects a large portion of validators. It should be somewhere between 66% and 100%, so 87.5% might be a good initial choice.

@oliviasaa Without such a mechanism, smaller validators could drop out one after another—due to economic, technical, or other reasons—and by continuously removing them from the voting set, we make the network more robust against these situations.

I agree with @alexsporn that this should be its own IIP. It is, however, related to the discussion in the other thread, where a validator offering 0% commission attracts voting power for potentially malicious purposes. Since stake moves slowly, this mechanism would also serve to remove such a validator “automatically” in the order of epochs instead of weeks.

dlt-green Aug 11, 2025

@starfish-one we also agree to remove malicious validators “automatically” in the order of epochs instead of weeks for sure.

aditya-manit · 2025-08-08T09:07:28Z

aditya-manit
Aug 8, 2025

Encapsulate is in favor of an automated, transparent performance score tied to rewards. Other ecosystems (e.g., Cosmos) already automate penalties for provable faults like double-signing and sustained downtime—good precedent. Same here: weight provable signals highest and keep “soft”/unprovable metrics low-impact to avoid gaming.

What we’d like to see
• Maintenance buffer (Maybe Cosmos-style): Use a rolling signing window with a minimum threshold so routine maintenance isn’t penalized. Make the thresholds on-chain and governable.
• On-chain transparency: Per-epoch public scorecards whose components/ proofs are derivable on-chain (e.g., via EndOfPublishV2), so anyone can audit.
• Provable-first weighting: Invalid blocks, equivocation, and clear liveness faults carry the most weight; softer signals stay capped.
• Rollout: Testnet + backtesting first, then mainnet; publish parameter ranges and invite governance input before coupling scores to rewards.

4 replies

Introduction of a Validator Score #21

Uh oh!

Uh oh!

oliviasaa Jul 31, 2025 Collaborator

Motivation

Specification

Performance Metrics, Proofs, And Partial Scores

Aggregating Scores

Adjusting Rewards

Rationale

Performance Metrics

Aggregating Scores

Adjusting Rewards

Reference Implementation

Backwards Compatibility

Replies: 9 comments · 73 replies

Uh oh!

Uh oh!

Uh oh!

oliviasaa Aug 3, 2025 Collaborator Author

Uh oh!

Uh oh!

An observation about the system architecture:

Uh oh!

Uh oh!

oliviasaa Aug 3, 2025 Collaborator Author

Uh oh!

oliviasaa Aug 3, 2025 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oliviasaa Aug 3, 2025 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oliviasaa Aug 3, 2025 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oliviasaa Aug 3, 2025 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oliviasaa Aug 4, 2025 Collaborator Author

Uh oh!

oliviasaa
Jul 31, 2025
Collaborator

Replies: 9 comments 73 replies

oliviasaa Aug 3, 2025
Collaborator Author

oliviasaa Aug 3, 2025
Collaborator Author

oliviasaa Aug 3, 2025
Collaborator Author

oliviasaa Aug 3, 2025
Collaborator Author

oliviasaa Aug 3, 2025
Collaborator Author

oliviasaa Aug 3, 2025
Collaborator Author

oliviasaa Aug 4, 2025
Collaborator Author