Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test!: adds the ability to set or unset bbr in e2e tests #3817

Merged
merged 38 commits into from
Sep 11, 2024

Conversation

staheri14
Copy link
Contributor

@staheri14 staheri14 commented Aug 23, 2024

Closes #3815 by disabling BBR in e2e tests.

@staheri14 staheri14 self-assigned this Aug 23, 2024
@staheri14 staheri14 added bug Something isn't working testing items that are strictly related to adding or extending test coverage labels Aug 23, 2024
@staheri14
Copy link
Contributor Author

staheri14 commented Aug 23, 2024

Both E2ESimple and MajorUpgradeToV2 tests work fine with this new change, however, MinorVersionCompatibility still fails, and I am investigating it. Feel free to comment if you see an immediate fix.

Can confirm that all e2e tests work fine.

evan-forbes
evan-forbes previously approved these changes Aug 25, 2024
test/e2e/testnet/node.go Show resolved Hide resolved
@staheri14
Copy link
Contributor Author

staheri14 commented Aug 28, 2024

UPDATE [This issue is resolved, please skip the comment]

After extensive debugging on this PR, I identified the reason behind the failure of the MinorVersionCompatibility test. The issue is unrelated to the changes in this PR. Instead, it stems from certain versions of celestia-app not being available on ghcr.io (or failing to be pulled).

Context and Reproducibility:

To reproduce the issue, run the MinorVersionCompatibility test from this branch. You will notice that the second validator, node 1, running version 1.33.0 fails:

--> Running end to end tests
go run ./test/e2e MinorVersionCompatibility
test-e2e2024/08/28 15:03:42 === RUN MinorVersionCompatibility
test-e2e2024/08/28 15:03:42 Running minor version compatibility test versions v1.0.0    v1.1.0  v1.10.0 v1.10.1 v1.11.0 v1.12.0 v1.13.0 v1.14.0 v1.2.0  v1.20.0 v1.21.0 v1.22.0 v1.23.0 v1.24.0 v1.25.0 v1.26.0 v1.27.0 v1.28.0 v1.29.0 v1.3.0  v1.30.0 v1.31.0 v1.32.0 v1.33.0 v1.34.0 v1.4.0  v1.5.0  v1.6.0  v1.7.0  v1.8.0  v1.9.0  v2.0.0  v2.0.0-rc0      v2.0.0-rc1      v2.0.0-rc2      v2.0.0-rc3      v2.0.0-rc4      v2.1.0v2.1.1  v2.1.2  v2.1.2-rc0      v2.1.2-rc1
INFO[0000] Initializing knuu with scope: runMinorVersionCompatibility_20240828_150342 
INFO[0000] The .env file does not exist, continuing without loading environment variables. 
INFO[2024-08-28T15:03:42-07:00]knuu/knuu.go:264 LOG_LEVEL: info                              
test-e2e2024/08/28 15:04:18 Starting node node 0 version v1.12.0
test-e2e2024/08/28 15:04:18 Starting node node 1 version v1.33.0
test-e2e2024/08/28 15:04:18 Starting node node 2 version v2.0.0-rc0
test-e2e2024/08/28 15:04:18 Starting node node 3 version v2.0.0-rc4
test-e2e2024/08/28 15:04:18 Creating txsim
{"level":"info","name":"txsim","directory":"/var/folders/gl/8f7g64t91rxc6hsz_gj0mqfh0000gn/T/txsim","time":"2024-08-28T15:04:20-07:00","message":"txsim keyring directory created"}
{"level":"info","name":"txsim","pk":"PubKeySecp256k1{038C900FC60525F5365DDD2AC8803C9AEC1E7D76FF5797A2730A7ED8C394461C01}","time":"2024-08-28T15:04:20-07:00","message":"txsim account created and added to genesis"}
{"level":"info","name":"txsim","image":"ghcr.io/celestiaorg/txsim:pr-3541","time":"2024-08-28T15:04:20-07:00","message":"setting image for tx client"}
test-e2e2024/08/28 15:04:20 Setting up testnet
{"level":"info","name":"val0","directory":"/var/folders/gl/8f7g64t91rxc6hsz_gj0mqfh0000gn/T/val0","time":"2024-08-28T15:04:20-07:00","message":"Creating validator's config and data directories"}
{"level":"info","name":"val1","directory":"/var/folders/gl/8f7g64t91rxc6hsz_gj0mqfh0000gn/T/val1","time":"2024-08-28T15:04:20-07:00","message":"Creating validator's config and data directories"}
{"level":"info","name":"val2","directory":"/var/folders/gl/8f7g64t91rxc6hsz_gj0mqfh0000gn/T/val2","time":"2024-08-28T15:04:20-07:00","message":"Creating validator's config and data directories"}
{"level":"info","name":"val3","directory":"/var/folders/gl/8f7g64t91rxc6hsz_gj0mqfh0000gn/T/val3","time":"2024-08-28T15:04:20-07:00","message":"Creating validator's config and data directories"}
test-e2e2024/08/28 15:04:20 Starting testnet
{"level":"info","time":"2024-08-28T15:04:26-07:00","message":"forwarding ports for genesis nodes"}
{"level":"info","name":"val0","version":"v1.12.0","time":"2024-08-28T15:04:34-07:00","message":"started and ports forwarded"}
{"level":"error","error":"timeout while waiting for instance 'val1-ccd8d54e' to be running","name":"val1","version":"v1.33.0","time":"2024-08-28T15:05:34-07:00","message":"failed to start and forward ports"}
2024/08/28 15:05:34 Failed to start testnet: node val1 failed to start: timeout while waiting for instance 'val1-ccd8d54e' to be running

The logs of the validator indicate pod initialization failed which may not provide much insight, but using Lens revealed more details about the cause: Error: ImagePullBackOff meaning that the image for that version seems unavailable.

You can also reproduce the issue by running E2ESimple with the version set to v1.33.0 and will get the similar results, i.e., failure in pulling the image.

You may change the seed in the MinorVersionCompatibility test, for example, to 45. This change results in different versions being assigned to the validators, but some versions, such as v1.34.0, still fail to be pulled.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (3)
test/e2e/simple.go (1)

29-31: Clarify the purpose of the boolean parameter in CreateGenesisNodes.

The addition of the boolean parameter to CreateGenesisNodes is crucial for meeting the PR's objectives. However, it would be beneficial to add a comment explaining what this parameter does, especially since it's set to true. This will improve code readability and maintainability.

The code changes are approved.

Consider adding a comment to explain the purpose of the boolean parameter:

+		// Set the last parameter to true to disable BBR for genesis nodes
		testNet.CreateGenesisNodes(4, latestVersion, 10000000, 0,
			testnet.DefaultResources, true))
test/e2e/benchmark/manifest.go (1)

108-117: Approve the changes in the summary method and suggest adding explanatory comments.

The updated summary method correctly incorporates the DisableBBR field to adjust the output based on the BBR settings. Adding comments to explain the conditional logic would enhance code readability.

The code changes are approved.

Consider adding comments to explain the conditional logic:

	if m.DisableBBR {
+		// Disable BBR: set latency to 0
		latency = 0
	}
test/e2e/minor_version_compatibility.go (1)

58-60: Clarify the purpose of the boolean parameter in CreateGenesisNode.

The addition of the boolean parameter to CreateGenesisNode is crucial for meeting the PR's objectives. However, it would be beneficial to add a comment explaining what this parameter does, especially since it's set to false. This will improve code readability and maintainability.

The code changes are approved.

Consider adding a comment to explain the purpose of the boolean parameter:

+		// Set the last parameter to false to disable BBR for genesis nodes
		testNet.CreateGenesisNode(v, 10000000, 0,
			testnet.DefaultResources, false))

evan-forbes
evan-forbes previously approved these changes Aug 30, 2024
@staheri14
Copy link
Contributor Author

I ran the following tests once more after merging the main branch into this PR and can confirm they all work fine:

  • E2ESimple
  • MajorUpgradeToV2
  • MinorVersionCompatibility
  • TwoNodeSimple
  • TwoNodeBigBlock8MBLatency

evan-forbes
evan-forbes previously approved these changes Sep 10, 2024
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but I think we can simply always use the no bbr flag here

@ninabarbakadze
Copy link
Member

Not blocking, but I think we can simply always use the no bbr flag here

Agreed! Might just be easier to simplify.

testnet.NoError("failed to create genesis node", testNet.CreateGenesisNode(v, 10000000, 0, testnet.DefaultResources))
testnet.NoError("failed to create genesis node",
testNet.CreateGenesisNode(v, 10000000, 0,
testnet.DefaultResources, false))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but I think we can simply always use the no bbr flag here

We cannot default to true i.e., disabling bbr since in the versions used in MinorVersionCompatibility, this flag is not present yet hence causing failure in the test. cc: @evan-forbes @ninabarbakadze

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on my previous comment: my perspective is that the bbr flag must be set according to the version of the celestia-app used for the validators. Since some versions support this flag while others do not, it is being made a parameter in the CreateGenesisNode function. This allows the caller, based on the version specified (the first parameter), to determine whether to enable or disable bbr.

@celestia-bot celestia-bot requested a review from a team September 10, 2024 18:45
@staheri14 staheri14 enabled auto-merge (squash) September 11, 2024 16:56
@staheri14
Copy link
Contributor Author

staheri14 commented Sep 11, 2024

The linter complaint is not caused by this PR.

@staheri14 staheri14 merged commit 2c53ddb into main Sep 11, 2024
31 of 32 checks passed
@staheri14 staheri14 deleted the sanaz/disable-bbr-in-e2e-tests branch September 11, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working testing items that are strictly related to adding or extending test coverage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2e tests fail since bbr is not enabled
4 participants