genesis: allow tests pass GenesisConfig to eth.New() - avoid double read of large genesis#21470
Open
AskAlexSharov wants to merge 14 commits into
Open
genesis: allow tests pass GenesisConfig to eth.New() - avoid double read of large genesis#21470AskAlexSharov wants to merge 14 commits into
eth.New() - avoid double read of large genesis#21470AskAlexSharov wants to merge 14 commits into
Conversation
Giulio2002
approved these changes
May 28, 2026
Contributor
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM \u2014 obviously small/trivial change (22 lines).
eth.New() - avoid double read of large genesis
eth.New() - avoid double read of large genesiseth.New() - avoid double read of large genesis
taratorio
approved these changes
May 28, 2026
eth.New() - avoid double read of large genesiseth.New() - avoid double read of large genesis
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
May 28, 2026
…ckly (erigontech#21483) ## Problem When a merge-queue run has a hive-eest shard fail, the failing job calls `gh run cancel ${{ github.run_id }}` (added in erigontech#21445). That sends SIGTERM to all in-flight matrix siblings, but the Docker-bound hive simulators take ~20 minutes to actually drain. `ci-gate` is `if: always()` and waits for every `needs` job to reach a terminal state, so the broken PR sits at `AWAITING_CHECKS` for the full drain time — blocking the head of the merge queue. Concrete example from today (PR erigontech#21470 at position #1): - 08:29:57 — `hive-eest / test-hive-eest (paris+shanghai, serial)` fails, calls `gh run cancel 26562610423`, emits the "Merge-queue root-cause failure" annotation from erigontech#21445. - 08:48 (~19 min later) — paris+shanghai-parallel, prague-serial/parallel, cancun-serial/parallel, osaka-parallel, rlp-serial/parallel, and glamsterdam-devnet-parallel were all still `in_progress`. Every other ci-gate child (tests, race-tests, eest-spec-tests, kurtosis, hive, lint, bench, repro, sonar, caplin) had already completed. The bottleneck was specifically the hive-eest matrix siblings. ## Fix ```yaml strategy: fail-fast: ${{ github.event_name == 'merge_group' }} ``` - **In `merge_group`**: first failed shard immediately cancels all siblings at the GitHub API layer — much faster than the `gh run cancel` → SIGTERM → runner-drain path. ci-gate's `needs` reach terminal state in seconds, ci-gate fails, the broken PR is evicted. - **In PR runs**: stays `false`, so authors still see the full failure breakdown across every shard. No regression in PR feedback. ## What's left in place and why The per-job `gh run cancel` step (test-hive-eest.yml lines 311-317) stays. Two reasons: - Matrix `fail-fast` only cancels siblings **within the same matrix** — it doesn't cancel sibling reusable workflows. If a future failure pattern leaks across workflows, `gh run cancel` still covers it. - ci-gate.yml's root-cause annotator (line 188) keys off "the leaf that ran `gh run cancel` successfully" to single out the true root cause among collateral cancellations. Removing the step would silently regress erigontech#21445's attribution. ## Scope choice Only `test-hive-eest.yml` is changed. Other matrix-bearing reusable workflows (`test-all-erigon.yml`, `test-all-erigon-race.yml`, `test-eest-spec.yml`, `test-kurtosis-assertoor.yml`, `test-hive.yml`, `test-bench.yml`) all use `fail-fast: false` too, but none of them were the queue-blocking long pole in this incident. Keeping the patch minimal; we can generalize if another workflow becomes the bottleneck. ## Tradeoff to be aware of Queue runs will now show siblings as `cancelled` instead of `failed` whenever any one shard fails. That's the correct tradeoff in `merge_group` — the goal is fast eviction, not detailed diagnostics; full per-shard breakdown remains available on the PR run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
taratorio
requested changes
May 29, 2026
Member
taratorio
left a comment
There was a problem hiding this comment.
this PR fails in the merge queue because it introduces a regression
https://github.com/erigontech/erigon/actions/runs/26588728733/job/78341615734
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NewEngineXTestRunnerto allow passconfig.Genesisobject (because it already has it) - to avoid 2nd unmarshal of200 MBgenesis8.86s -> 6.34s