-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use proxy instead of port forwards #3505
feat: use proxy instead of port forwards #3505
Conversation
Signed-off-by: Smuu <[email protected]>
test/e2e/check_upgrades.go
Outdated
func waitForHeight(testnet *testnet.Testnet, node *testnet.Node, height int64, period time.Duration) error { | ||
func waitForHeight(ctx context.Context, client *http.HTTP, height int64, period time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes it back to how it was done before.
if !errors.Is(err, context.DeadlineExceeded) { | ||
return fmt.Errorf("expected context.DeadlineExceeded, got %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. It failed me, as the error returned is not context.DeadlineExceeded
. Commented it out for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on not understanding this. I created #3380 which we can close if we end up deleting these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error did you get? Was it context cancelled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @smuu (just so we don't forget this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get any error when running txsim.
When having the code to expect context.DeadlineExceeded
, the test fails with this specified error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmwaters What was the logic behind you adding this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused because the txsim should always return an error because it's lifecycle is managed by the context you provide so either you cancel it or it hits a deadline
test/e2e/testnet/node.go
Outdated
err, rpcProxyHost := n.Instance.AddHost(rpcPort) | ||
if err != nil { | ||
return err | ||
} | ||
n.rpcProxyHost = rpcProxyHost | ||
|
||
err, grpcProxyHost := n.Instance.AddHost(grpcPort) | ||
if err != nil { | ||
return err | ||
} | ||
n.grpcProxyHost = grpcProxyHost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a host to reach the RPC/gRPC endpoint via the reverse proxy without relying on port-forwards.
return n.forwardPorts() | ||
} | ||
|
||
func (n *Node) forwardPorts() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to forward ports anymore.
return nil | ||
} | ||
|
||
func (n *Node) ForwardBitTwisterPort() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. When using instance.EnableBittwister
this is done under the hood.
return nil | ||
} | ||
|
||
func DockerImageName(version string) string { | ||
return fmt.Sprintf("%s:%s", dockerSrcURL, version) | ||
} | ||
|
||
func (n *Node) GetHeight(executor *knuu.Executor) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed anymore.
@@ -20,20 +20,12 @@ import ( | |||
type Testnet struct { | |||
seed int64 | |||
nodes []*Node | |||
executor *knuu.Executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Executor is not needed anymore.
if i == 9 { | ||
return fmt.Errorf("node %s status response: %w", node.Name, err) | ||
} | ||
time.Sleep(time.Second) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are scenarios where the reverse proxy still needs to be ready. This does not fail now when the status requests return a non 2xx
return code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way where the reverse proxy can tell us when it's ready? Or a way where we don't need to rely on sleeping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Knuu
is initialized with proxy enabled, the deploy and check service for proxy is done internally and they are blocking, so once knuu is initialized, the proxy should be ready. There should be no need for sleep and wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deployment of the proxy is blocking until deplopyed.
But when doing some testing I observed that the proxy takes some time to apply the config for a certain Instance.
When creating a forward for an Instance using the proxy, it makes a custom resource in Kubernetes, which is then picked up by the proxy.
Sometimes, we use the proxy host before the proxy applies this configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, so we probably can add something to the AddHost
func to check that. or have a callback func if that's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind to create an issue for it to keep tracking of it in its own issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue in knuu: celestiaorg/knuu#423
test/e2e/testnet/node_helpers.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a workaround and is not needed anymore.
Signed-off-by: Smuu <[email protected]>
WalkthroughThe changes primarily involve refactoring the Changes
Sequence Diagram(s)Generating sequence diagrams is not necessary for these changes as they mainly involve refactoring and configuration adjustments without introducing new features or complex control flows. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM 🎉
Left some comments to understand more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! second some of the nits from Rachid and Nina
Signed-off-by: Smuu <[email protected]>
Signed-off-by: Smuu <[email protected]>
Signed-off-by: Smuu <[email protected]>
Signed-off-by: Smuu <[email protected]>
This PR now uses the release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @smuu! I have a few comments but these can also be addressed in a follow up PR as to not be blocking
@cmwaters This would mean a major rewrite of your usage of knuu. We paused that for now, as the repo has so many PRs open that is touching the e2e tests. I propose to ignore the deprecation warning in this PR, get all the knuu-related PRs merged, and then create a PR to move to the new structure. |
Okay, I will disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Outside diff range and nitpick comments (2)
test/e2e/minor_version_compatibility.go (1)
Line range hint
38-90
: Consider refactoring to reduce the complexity of theMinorVersionCompatibility
function.- for i := 0; i < len(versions)*2; i++ { - if i%numNodes == 0 { - continue - } - client, err := testNet.Node(i % numNodes).Client() - testnet.NoError("failed to get client", err) - - heightBefore, err := getHeight(ctx, client, time.Second) - testnet.NoError("failed to get height", err) - - newVersion := versions.Random(r).String() - logger.Println("Upgrading node", "node", i%numNodes+1, "version", newVersion) - testnet.NoError("failed to upgrade node", testNet.Node(i%numNodes).Upgrade(newVersion)) - time.Sleep(10 * time.Second) - testnet.NoError("failed to wait for height", waitForHeight(ctx, client, heightBefore+2, 30*time.Second)) - } + for i := 1; i < len(versions)*2; i++ { + if i%numNodes != 0 { + client, err := testNet.Node(i % numNodes).Client() + testnet.NoError("failed to get client", err) + + heightBefore, err := getHeight(ctx, client, time.Second) + testnet.NoError("failed to get height", err) + + newVersion := versions.Random(r).String() + logger.Println("Upgrading node", "node", i%numNodes+1, "version", newVersion) + testnet.NoError("failed to upgrade node", testNet.Node(i%numNodes).Upgrade(newVersion)) + time.Sleep(10 * time.Second) + testnet.NoError("failed to wait for height", waitForHeight(ctx, client, heightBefore+2, 30*time.Second)) + } + }test/e2e/testnet/testnet.go (1)
Line range hint
330-346
: The retry logic in theStart
method uses a fixed sleep of 1 second which might not be optimal in all situations.Consider using exponential backoff or configurable delay intervals for better performance and adaptability.
// FIXME: This does not work currently with the reverse proxy | ||
// func (t *Testnet) GRPCEndpoints() []string { | ||
// grpcEndpoints := make([]string, len(t.nodes)) | ||
// for idx, node := range t.nodes { | ||
// grpcEndpoints[idx] = node.AddressGRPC() | ||
// } | ||
// return grpcEndpoints | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out function GRPCEndpoints
due to reverse proxy issues might need a revisit or proper documentation on why it's disabled.
Consider documenting or revisiting the necessity of this function in the context of reverse proxy adjustments.
@@ -399,7 +395,7 @@ | |||
if err != nil { | |||
log.Err(err). | |||
Str("name", txsim.Name). | |||
Msg("txsim failed to stop") | |||
Msg("failed to wait for txsim to stop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper error handling in the cleanup process for txsim
.
It's crucial to handle errors robustly during cleanup to prevent resource leaks or incomplete cleanup operations.
@@ -1,3 +1,4 @@ | |||
//nolint:staticcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The extended context search did not reveal any comments explaining the reason for disabling the static check linting. This suggests that the reasons for using //nolint:staticcheck
are not documented in the code.
test/e2e/minor_version_compatibility.go
test/e2e/major_upgrade_v2.go
test/e2e/testnet/txsimNode.go
test/e2e/testnet/node.go
test/e2e/benchmark/benchmark.go
test/e2e/testnet/testnet.go
None of these files contain comments explaining the use of //nolint:staticcheck
.
Analysis chain
Ensure the static check linting is appropriately disabled for valid reasons.
To ensure that the static check linting is disabled for valid reasons, we need to verify if there are any comments or explanations in the surrounding code that justify the use of //nolint:staticcheck
. Since the initial context did not provide sufficient information, we should extend the search context.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the static check linting is disabled for valid reasons.
# Test: Search for the static check linting disable comments. Expect: Comments explaining the reason.
rg --type go $'//nolint:staticcheck' -C 5
Length of output: 1524
Script:
#!/bin/bash
# Description: Verify if there are explanations for disabling static check linting by extending the context.
# Test: Search for the static check linting disable comments with extended context to find any explanations.
rg --type go $'//nolint:staticcheck' -C 10
Length of output: 3017
@@ -1,19 +1,17 @@ | |||
//nolint:staticcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The search did not find any comments in the file explaining the reason for disabling the static check.
- Ensure to add a comment explaining why the
//nolint:staticcheck
directive is necessary.
Analysis chain
Ensure that the static check is disabled for a valid reason.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the reason for disabling static checks.
# Test: Search for explanations in code comments. Expect: Comment explaining the reason.
rg --type go '//nolint:staticcheck' -A 1 test/e2e/major_upgrade_v2.go
Length of output: 103
Script:
#!/bin/bash
# Description: Search for any comments in the file that might explain the reason for disabling the static check.
# Test: Search for all comments in the file. Expect: Any comment that provides an explanation.
rg --type go '^\\s*//' test/e2e/major_upgrade_v2.go
Length of output: 53
@@ -1,8 +1,8 @@ | |||
//nolint:staticcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The search results indicate multiple instances of the //nolint:staticcheck
directive across various files in the codebase, but none of them provide an explanation or justification for disabling the static check.
test/e2e/minor_version_compatibility.go
test/util/malicious/test_app.go
test/e2e/major_upgrade_v2.go
test/e2e/testnet/txsimNode.go
test/e2e/testnet/testnet.go
test/e2e/testnet/node.go
test/e2e/benchmark/benchmark.go
cmd/celestia-appd/cmd/root.go
None of these instances include comments explaining the reason for disabling the static check.
Analysis chain
Ensure that the static check is disabled for a valid reason.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the reason for disabling static checks.
# Test: Search for explanations in code comments. Expect: Comment explaining the reason.
rg --type go '//nolint:staticcheck' -A 1 test/e2e/minor_version_compatibility.go
Length of output: 114
Script:
#!/bin/bash
# Description: Search for project-wide documentation or comments explaining the use of `//nolint:staticcheck`.
# Search for any comments in the codebase that mention `staticcheck`.
rg --type go 'staticcheck' -A 3
Length of output: 1535
@@ -1,3 +1,4 @@ | |||
//nolint:staticcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
There is no comment explaining the reason for disabling the static check in the surrounding lines of the //nolint:staticcheck
directive.
- test/e2e/benchmark/benchmark.go:1
Please ensure that the static check is disabled for a valid reason and add a comment explaining why it is necessary.
Analysis chain
Ensure that the static check is disabled for a valid reason.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the reason for disabling static checks.
# Test: Search for explanations in code comments. Expect: Comment explaining the reason.
rg --type go '//nolint:staticcheck' -A 1 test/e2e/benchmark/benchmark.go
Length of output: 106
Script:
#!/bin/bash
# Description: Search for any comments in the surrounding lines of the nolint directive to find an explanation.
# Test: Search for comments in the first 10 lines of the file to find any explanation for disabling static checks.
rg --type go '^//.*' -A 10 test/e2e/benchmark/benchmark.go
Length of output: 1315
// This returns the proxy host that can be used to communicate with the node | ||
func (n Node) AddressRPC() string { | ||
return fmt.Sprintf("http://127.0.0.1:%d", n.rpcProxyPort) | ||
return n.rpcProxyHost | ||
} | ||
|
||
// AddressGRPC returns a GRPC endpoint address for the node. This returns the | ||
// local proxy port that can be used to communicate with the node | ||
func (n Node) AddressGRPC() string { | ||
return fmt.Sprintf("127.0.0.1:%d", n.grpcProxyPort) | ||
} | ||
// FIXME: This does not work currently with the reverse proxy | ||
// // AddressGRPC returns a GRPC endpoint address for the node. | ||
// // This returns the proxy host that can be used to communicate with the node | ||
// func (n Node) AddressGRPC() string { | ||
// return n.grpcProxyHost | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing commented-out code if it is no longer needed.
- // FIXME: This does not work currently with the reverse proxy
- // // AddressGRPC returns a GRPC endpoint address for the node.
- // // This returns the proxy host that can be used to communicate with the node
- // func (n Node) AddressGRPC() string {
- // return n.grpcProxyHost
- // }
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// This returns the proxy host that can be used to communicate with the node | |
func (n Node) AddressRPC() string { | |
return fmt.Sprintf("http://127.0.0.1:%d", n.rpcProxyPort) | |
return n.rpcProxyHost | |
} | |
// AddressGRPC returns a GRPC endpoint address for the node. This returns the | |
// local proxy port that can be used to communicate with the node | |
func (n Node) AddressGRPC() string { | |
return fmt.Sprintf("127.0.0.1:%d", n.grpcProxyPort) | |
} | |
// FIXME: This does not work currently with the reverse proxy | |
// // AddressGRPC returns a GRPC endpoint address for the node. | |
// // This returns the proxy host that can be used to communicate with the node | |
// func (n Node) AddressGRPC() string { | |
// return n.grpcProxyHost | |
// } | |
// This returns the proxy host that can be used to communicate with the node | |
func (n Node) AddressRPC() string { | |
return n.rpcProxyHost | |
} |
@@ -1,8 +1,7 @@ | |||
//nolint:staticcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The //nolint:staticcheck
comment in test/e2e/testnet/node.go
lacks an accompanying explanation. Additionally, the analysis of the code did not reveal any obvious reasons for disabling static checks, such as unused variables or deprecated functions.
- File:
test/e2e/testnet/node.go
- Lines: 1-1
Please provide an explanation for disabling static checks or ensure that the static check is disabled for a valid reason.
Analysis chain
Ensure that the static check is disabled for a valid reason.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the reason for disabling static checks.
# Test: Search for explanations in code comments. Expect: Comment explaining the reason.
rg --type go '//nolint:staticcheck' -A 1 test/e2e/testnet/node.go
Length of output: 102
Script:
#!/bin/bash
# Search for all comments in the file to find any explanations for disabling static checks
rg --type go '^//' test/e2e/testnet/node.go
# Analyze the code in the file to identify potential static check issues
ast-grep --lang go --pattern $'package $_\n$$$' test/e2e/testnet/node.go
Length of output: 1538
Script:
#!/bin/bash
# Analyze the code in the file to identify potential static check issues with a corrected AST-grep pattern
ast-grep --lang go --pattern $'package $_ { $$$ }' test/e2e/testnet/node.go
Length of output: 517
Script:
#!/bin/bash
# Analyze the code in the file to identify potential static check issues with a more specific AST-grep pattern
ast-grep --lang go --pattern $'$_ := $_' test/e2e/testnet/node.go
ast-grep --lang go --pattern $'$_ = $_' test/e2e/testnet/node.go
ast-grep --lang go --pattern $'func $_($_) $_ { $$$ }' test/e2e/testnet/node.go
Length of output: 4654
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @smuu! I ran all the e2e tests and confirm that they work:
- E2ESimple ✅
- MinorVersionCompatibility ✅
- MajorUpgradeToV2 ✅
The only concern I have is that I keep requiring creating a new namespace for each test (as suggested by @mojtaba-esk in this comment) due to this error
2024/06/17 12:20:09 failed to create testnet: cannot initialize knuu: cannot deploy Traefik: error creating Traefik deployment: deployments.apps "traefik-deployment" already exists
Could it be related to the changes introduced in this PR? Does it also happen to other reviewers?
logger.Println("Creating txsim") | ||
endpoints, err := testNet.RemoteGRPCEndpoints() | ||
testnet.NoError("failed to get remote gRPC endpoints", err) | ||
err = testNet.CreateTxClient("txsim", testnet.TxsimVersion, 1, "100-2000", 100, testnet.DefaultResources, endpoints[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Not to be addressed in this PR yet relevant]: testnet.TxsimVersion
is currently a hardcoded version of txsim. While it works fine for benchmark tests, I believe that for e2e tests, especially considering our plan to add this capability, we need to ensure that we always use the latest version of txsim. We already have an issue for this: #3288.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smuu can we link any issues to re-enable reverse proxy (or whatever mechanism enables us to access grpc via the node assessing the tests) for posterity?
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks @smuu this is super helpful
@evan-forbes will the local machine running the tests ever need access to a nodes grpc? (Can't we use rpc or grpc gateway instead) |
@staheri14 How are you running the tests? It's the default now for knuu to have a namespace per test run. |
To my understanding out reverse proxy can proxy gRPC connections. |
@evan-forbes @cmwaters, as I don't have permission to merge, feel free to merge this when you feel the PR is ready. |
Created this issue: #3580 |
@staheri14, I'm also seeing the namespace problem appear in our CI tests:
Going to create an issue. |
Overview
This pull request removes the need for port-forwarding. Instead, it facilitates the reverse proxy that is available in the scope.
I will comment in code to explain the changes.
@staheri14 Feel free to merge this as well to your branch