Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test] CAPM3 deployment is not ready #1851

Closed
mboukhalfa opened this issue Jul 19, 2024 · 5 comments
Closed

[Flaky Test] CAPM3 deployment is not ready #1851

mboukhalfa opened this issue Jul 19, 2024 · 5 comments
Labels
kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@mboukhalfa
Copy link
Member

Which jobs are flaking?

Only :
Clusterctl upgrade main

Which tests are flaking?

metal3-periodic-e2e-clusterctl-upgrade-test-main When testing cluster upgrade from releases (v1.7=>current) in the STEP: [0] Upgrading providers to the latest version available

Since when has it been flaking?

First seen on Jul 10, 2024, 10:10:00 PM

Jenkins link

https://jenkins.nordix.org/view/Metal3%20Periodic/job/metal3-periodic-e2e-clusterctl-upgrade-test-main/77/consoleFull

Reason for failure (if possible)

Not sure just pasting the error logs

16:51:00    < Exit [AfterEach] When testing cluster upgrade from releases (v1.7=>current) [clusterctl-upgrade] @ 07/18/24 13:50:54.088 (5m1.138s)
16:51:00  • [FAILED] [3413.643 seconds]
16:51:00  When testing cluster upgrade from releases (v1.7=>current) [clusterctl-upgrade] [It] Should create a management cluster and then upgrade all the providers
16:51:00  /home/metal3ci/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/e2e/clusterctl_upgrade.go:253
16:51:00  
16:51:00    [FAILED] failed to run clusterctl upgrade
16:51:00    Unexpected error:
16:51:00        <*errors.withStack | 0xc001bb3848>: 
16:51:00        deployment "capm3-controller-manager" is not ready after 5m0s: failed to connect to the management cluster: action failed after 0 attempts: context deadline exceeded
16:51:00        {
16:51:00            error: <*errors.withMessage | 0xc001fecde0>{
16:51:00                cause: <*errors.withStack | 0xc001bb3818>{
16:51:00                    error: <*errors.withMessage | 0xc001fecdc0>{
16:51:00                        cause: <*errors.withStack | 0xc001bb37e8>{
16:51:00                            error: <*errors.withMessage | 0xc001fecda0>{
16:51:00                                cause: <context.deadlineExceededError>{},
16:51:00                                msg: "action failed after 0 attempts",
16:51:00                            },
16:51:00                            stack: [0x3113ac5, 0x3132326, 0x31198b0, 0x24afe72, 0x24afccd, 0x24b0185, 0x31196f3, 0x3119673, 0x3119507, 0x314108b, 0x313ec45, 0x315aa8d, 0x326dc48, 0x3272d08, 0x34d627a, 0x192c193, 0x194036d, 0x148fe21],
16:51:00                        },
16:51:00                        msg: "failed to connect to the management cluster",
16:51:00                    },
16:51:00                    stack: [0x313233c, 0x31198b0, 0x24afe72, 0x24afccd, 0x24b0185, 0x31196f3, 0x3119673, 0x3119507, 0x314108b, 0x313ec45, 0x315aa8d, 0x326dc48, 0x3272d08, 0x34d627a, 0x192c193, 0x194036d, 0x148fe21],
16:51:00                },
16:51:00                msg: "deployment \"capm3-controller-manager\" is not ready after 5m0s",
16:51:00            },
16:51:00            stack: [0x31197e8, 0x3119507, 0x314108b, 0x313ec45, 0x315aa8d, 0x326dc48, 0x3272d08, 0x34d627a, 0x192c193, 0x194036d, 0x148fe21],
16:51:00        }
16:51:00    occurred
16:51:00    In [It] at: /home/metal3ci/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/clusterctl/client.go:202 @ 07/18/24 13:35:57.588
16:51:00  
16:51:00    Full Stack Trace
16:51:00      sigs.k8s.io/cluster-api/test/framework/clusterctl.Upgrade({_, _}, {{0xc000de2300, 0x53}, {0xc00063a35d, 0x47}, 0x0, {0xc001a7a300, 0x24}, {0xc00193b640, ...}, ...})
16:51:00      	/home/metal3ci/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/clusterctl/client.go:202 +0x63d
16:51:00      sigs.k8s.io/cluster-api/test/framework/clusterctl.UpgradeManagementClusterAndWait({_, _}, {{0x41f4178, 0xc000fb1560}, {0xc00063a35d, 0x47}, 0x0, {0x3d27c5b, 0x7}, {0x0, ...}, ...}, ...)
16:51:00      	/home/metal3ci/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/clusterctl/clusterctl_helpers.go:212 +0x9e8
16:51:00      sigs.k8s.io/cluster-api/test/e2e.ClusterctlUpgradeSpec.func2()
16:51:00      	/home/metal3ci/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/e2e/clusterctl_upgrade.go:582 +0x38ba

Anything else we need to know?

clusterctl was failing because of other issue the got fixed recently and this flake can be also because of infra also it is frequent

Label(s) to be applied

/kind flake

@metal3-io-bot metal3-io-bot added kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Jul 19, 2024
@mboukhalfa
Copy link
Member Author

triage/accepted

@mboukhalfa mboukhalfa added the triage/accepted Indicates an issue is ready to be actively worked on. label Jul 22, 2024
@metal3-io-bot metal3-io-bot removed the needs-triage Indicates an issue lacks a `triage/foo` label and requires one. label Jul 22, 2024
@metal3-io metal3-io deleted a comment from metal3-io-bot Jul 22, 2024
@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2024
@mquhuy
Copy link
Member

mquhuy commented Oct 21, 2024

@mboukhalfa is this still an issue?

@mboukhalfa
Copy link
Member Author

seems no the clusterctl did not fail for long

@adilGhaffarDev
Copy link
Member

we are not seeing this flake anymore. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants