Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(miner): ignore lastWork when selecting the best mining candidate + tests #12659

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Stebalien
Copy link
Member

@Stebalien Stebalien commented Oct 30, 2024


Related Issues

fixes filecoin-project/go-f3#730

Proposed Changes

Ignore lastWork when selecting the best mining candidate.

Previously, we only took the new head if it's heavier than the last head. Unfortunately, this meant that F3 finalization wasn't properly propagated to the miner.

In terms of impact:

  1. It seems likely that this check was simply defensive as, prior to F3, the new head should never have a lower weight (unless you're talking to multiple lotus nodes, I guess...).
  2. The lastWork field is mostly used to track null blocks. Worst-case scenario, if we switch heads, we'll attempt to re-mine previous heights. However, that should be relatively fast and, due to the slash filter, we
    won't attempt to re-broadcast any of those blocks.

Checklist

Before you mark the PR ready for review, please make sure that:

@Stebalien
Copy link
Member Author

NOTE: This is the smallest possible fix for this bug. An ideal fix would have some other properties:

  1. Ideally we'd keep the heaviest head unless the new head was finalized by F3, but that's a larger fix and is accounting for a case where something is already going wrong.
  2. Ideally we wouldn't track "nulls" but would instead track the last height we mined on irrespective of the last base we worked on. However, that's a much larger change and not something I feel comfortable shipping in a late RC. Worst-case scenario, we try to re-mine heights when catch-up mining, but that's fine because we already do that.

In other words, this fix should be no worse than the current code.

Previously, we only took the new head if it's heavier than the last
head. Unfortunately, this meant that F3 finalization wasn't properly
propagated to the miner.

In terms of impact:

1. It seems likely that this check was simply defensive as, prior to F3,
the new head should never have a lower weight (unless you're talking to
multiple lotus nodes, I guess...).
2. The `lastWork` field is mostly used to track null blocks. Worst-case
scenario, if we switch heads, we'll attempt to re-mine previous heights.
However, that should be relatively fast and, due to the slash filter, we
won't attempt to re-broadcast any of those blocks.
@Stebalien Stebalien force-pushed the steb/fix-best-mining-candidate branch from 704855e to e1a0572 Compare October 30, 2024 15:44
Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on standup discussion. As far as I can tell the previous implementation is due to a whole bunch of historical reasons that no longer apply.

@Stebalien
Copy link
Member Author

We looked into history a bit more, it looks like this check was a part of the very first WIP commit and just stuck the entire way.

@Stebalien
Copy link
Member Author

Well, it kind of works. But our slash-filter handling is bonkers. If we hit the slash filter, we'll keep trying to re-mine the same height until someone wins which obviously won't work. So now I'm trying to fix that.

@Stebalien
Copy link
Member Author

Hm. So, I don't think that'll work because mining on a "different fork" would be slashable, IIRC. In practice this isn't an issue because, while we can't mine until someone else does, someone else will eventually mine on a different height.

So I'm just going to setup a test with an additional miner.

@Stebalien
Copy link
Member Author

Ok, the handling of null blocks with respect to slashed blocks broke everything (that and our syncing logic has some issues). I've pushed a WIP commit with a test and hacky fixes that make the test "pass", but I don't think they're 100% correct.

Issues:

  1. We don't update our base after submitting a block (will cause catch-up issues).
  2. We don't update the null blocks counter on the base after skipping a block due to the slash filter (prevents progress in the test).
  3. We don't wait after handling the slash filter (my current patch doesn't wait either... not sure what we should do here? does it matter?).
  4. The checkpoint sync logic can't handle the specific type of fork I'm using in the test (fork to a tipset with fewer blocks). I think my hacky fix is correct but we'll need verification on that.

This test still doesn't pass and I'm not sure why:

1. Sync is a bit broken because it can't figure out that we already have
all the data locally.
2. But with 2 nodes, it should work. Except that, if I add some logging,
I see that sync works until the libp2p nodes just flat-out disconnect
from eachother.
@Stebalien Stebalien force-pushed the steb/fix-best-mining-candidate branch from 0c0b158 to 4404614 Compare November 1, 2024 11:54
@Stebalien
Copy link
Member Author

Ok, I think my fix for the mining loop might be ok, but someone should look at it.

However, we still need to fix the sync issue to get the tests to pass:

  1. I tried some simple fixes but, the real issue is that syncs that require forks always download 900 blocks then check them against the current head.
  2. But... it should work anyways. As written, with 2 nodes, sync should "just work". And it does until suddenly a request fails because the nodes disconnect (in libp2p) for some reason.

So I think the sync issue is ultimately some libp2p strangeness.

miner/miner.go Dismissed Show dismissed Hide dismissed
We also perform this check inside `SyncSubmitBlock` so we did have an
effective filter, but this was still wrong.
We have some cases where we submit a tipset to ourselves, from
ourselves and end up calling `AddPeer` with our own ID. Ignore this case.
A side-effect of InformNewHead is to record the peer for future
chain-sync sessions. If we don't pass blocks to InformNewHead here, we
can have some difficulty bootstrapping networks.
log.Debugf("Got new tipset through Hello: %s from %s", ts.Cids(), s.Conn().RemotePeer())
hs.syncer.InformNewHead(s.Conn().RemotePeer(), ts)
}
// don't bother informing about genesis
Copy link
Contributor

@Kubuxu Kubuxu Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// don't bother informing about genesis

@Kubuxu Kubuxu changed the title fix(miner): ignore lastWork when selecting the best mining candidate fix(miner): ignore lastWork when selecting the best mining candidate + tests Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✔️ Approved by reviewer
Development

Successfully merging this pull request may close these issues.

GetBestMiningCandidate & F3
4 participants