Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster doesn't restart when docker restarts #148

Closed
vincepri opened this issue Dec 4, 2018 · 97 comments
Closed

Cluster doesn't restart when docker restarts #148

vincepri opened this issue Dec 4, 2018 · 97 comments
Assignees
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@vincepri
Copy link
Member

vincepri commented Dec 4, 2018

When docker restarts or stop/start (for any reason), the kind node containers remain stopped and aren't restarted properly. When I tried to run docker restart <node container id> the cluster didn't start either.

The only solution seems to recreate the cluster at this point.

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 4, 2018
@BenTheElder
Copy link
Member

Isn't this standard behavior with docker containers? (remaining stopped?)

kind does not run any daemon to mange the cluster, the commands create / delete "nodes" (containers) and run some tasks in them (like kubeadm init), they're effectively "unmanaged".

docker restart is not going to work, because creating a container is not just docker run, we need to take a few actions after creating the container.


What is the use case for this? These are meant to be transient test-clusters and it's probably not a good idea to restart the host daemon during testing.

"Restarting" a cluster is probably going to just look like delete + create of the cluster.

I'm not sure I'd even consider supporting this so much of a bug as a feature, "node" restarts are not really intended functionality currently.

@neolit123
Copy link
Member

What is the use case for this?

+1 to this question.

docker restart in this case will act like a power grid restart on a bunch of bare metal machines.
so while those bare metal machines might come back up, not sure if we want to support this for kind.
for that to work i think some sort of state has to be stored somewhere...

@vincepri
Copy link
Member Author

vincepri commented Dec 5, 2018

I've been using kind locally (using Docker for Mac) and when docker reboots or stops, the cluster has to be deleted and recreated. I'm perfectly fine with it, just thought this might be something we should look into.

The use case was to keep the cluster around even after I reboot or shut down my machine / docker.

@BenTheElder
Copy link
Member

Thanks for clarifying - this is certainly a hole in the usability but I'd hoped that clusters would be cheap enough to [create, use, delete] regularly.

This might be a little non-trivial to resolve but is probably do-able.
/priority backlog
/help

@k8s-ci-robot
Copy link
Contributor

@BenTheElder:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Thanks for clarifying - this is certainly a hole in the usability but I'd hoped that clusters would be cheap enough to [create, use, delete] regularly.

This might be a little non-trivial to resolve but is probably do-able.
/priority backlog
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Dec 5, 2018
@BenTheElder
Copy link
Member

I think I know how we can do this effectively, but I have no idea what to call the command that will fit with the rest of the CLI 🙃

cc @munnerz

Something like kind restart cluster maybe?

@vincepri
Copy link
Member Author

vincepri commented Dec 6, 2018

restart seems it fits well with the other create/delete cluster commands, what's the idea you had? Wondering if it actually fits the restart word or it's something more.

@BenTheElder
Copy link
Member

It should roughly be:

  • list the containers matching the cluster name
  • for each ...
    • docker {re}start
    • run the pre-boot fixes (mounts)
    • signal the entrypoint to boot
  • optionally --wait for the control-plane like create

It'll look similar to create but skip a lot of steps and swap creating the containers for list & {re}start

We can also eventually have a very similar command like kind restart node

@vincepri
Copy link
Member Author

vincepri commented Dec 6, 2018

I like that approach, and the node restart also sounds nice and could cover other use cases.

@BenTheElder BenTheElder added this to the 2019 goals milestone Dec 8, 2018
@neolit123
Copy link
Member

/remove-kind bug
/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 12, 2018
@tao12345666333
Copy link
Member

Something like kind restart cluster maybe?

@BenTheElder I want to try it.

/assign

@k8s-ci-robot
Copy link
Contributor

@tao12345666333: GitHub didn't allow me to assign the following users: tao12345666333.

Note that only kubernetes-sigs members and repo collaborators can be assigned.
For more information please see the contributor guide

In response to this:

Something like kind restart cluster maybe?

@BenTheElder I want to try it.

/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

/lifecycle active
thanks @tao12345666333

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Dec 13, 2018
@clkao
Copy link

clkao commented Feb 17, 2019

for the impatient, this seems to work for now after docker restarts:

docker start kind-1-control-plane && docker exec kind-1-control-plane sh -c 'mount -o remount,ro /sys; kill -USR1 1'

FixMounts has a few mount --make-shared, not sure if they are really required.

@BenTheElder
Copy link
Member

The make shared may not be required anymore, those are related to mount propagation functionality in kubelet / storage. It looks like with a tweak to how docker runs on the nodes we might not need those.

We should check with hack/local-up-cluster.sh (IE @dims) on this as well, they have it still as well.
https://github.com/kubernetes/kubernetes/blob/07a5488b2a8f67add543da72e8819407d8314204/hack/local-up-cluster.sh#L1039-L1040

  # configure shared mounts to prevent failure in DIND scenarios
  mount --make-rshared /

I've also been thinking about ways we can make things like docker start just work.
The /sys remount is especially unfortunate, but I don't think we can do much about it easily because specifying a /sys mount clashes with --privileged (and we still need the latter).

// systemd-in-a-container should have read only /sys
// https://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/
// however, we need other things from `docker run --privileged` ...
// and this flag also happens to make /sys rw, amongst other things
if err := n.Command("mount", "-o", "remount,ro", "/sys").Run(); err != nil {

@hjacobs
Copy link

hjacobs commented Mar 23, 2019

👍 for the new restart cluster command!

@amwais
Copy link

amwais commented Mar 23, 2019

The restart cluster command will make kind the top of his class. Without it, it's a painful process to build test envs upon since restarting the whole process means re-downloading all the docker images from scratch, a lengthy process.

@tao12345666333
Copy link
Member

I will sent a PR next week. (

@amwais
Copy link

amwais commented Mar 23, 2019

Looking forward! Is there any ticket for that, for tracking purposes?

@tao12345666333
Copy link
Member

not yet. I will update to the progress here.

@BenTheElder BenTheElder removed this from the 2019 goals milestone Mar 25, 2019
@nrapopor
Copy link

@BenTheElder -- Many thanks! this will make our lives easier!!!. I was troubleshooting a weird Azure issue for the last couple of weeks, so had no time for anything else. But this is awesome news

@victor-sudakov
Copy link

Do I understand correctly that I'll have to kind create cluster and reinstall all the Helm charts and manifests I'm testing after each reboot of my workstation/laptop? No data is preserved across reboots including PVs? This is a major inconvenience.

@BenTheElder
Copy link
Member

BenTheElder commented Feb 18, 2022

Do I understand correctly that I'll have to kind create cluster and reinstall all the Helm charts and manifests I'm testing after each reboot of my workstation/laptop? No data is preserved across reboots including PVs? This is a major inconvenience.

No, that's not the case, kind v0.8.0+ supports restarts for single node clusters (#148 (comment)). I've had clusters for months across many restarts. There is non data loss.

There's a different tracking issue with problems around multi-node #1689

@victor-sudakov
Copy link

No, that's not the case, kind v0.8.0+ supports restarts for single node clusters

I have Kind 0.11.1 but a 3-node cluster (1 control-plane and 2 workers). It's not a HA cluster in the sense that it has only 1 control-plane node, but it's not single-node either. Should it survive a reboot?

yankay pushed a commit to yankay/kind that referenced this issue Mar 17, 2022
@caniko
Copy link

caniko commented Mar 20, 2022

I can't figure out a way to restart my cluster:

╰─λ kind --help
kind creates and manages local Kubernetes clusters using Docker container 'nodes'

Usage:
kind [command]

Available Commands:
build       Build one of [node-image]
completion  Output shell completion code for the specified shell (bash, zsh or fish)
create      Creates one of [cluster]
delete      Deletes one of [cluster]
export      Exports one of [kubeconfig, logs]
get         Gets one of [clusters, nodes, kubeconfig]
help        Help about any command
load        Loads images into nodes
version     Prints the kind CLI version

Flags:
-h, --help              help for kind
--loglevel string   DEPRECATED: see -v instead
-q, --quiet             silence all stderr output
-v, --verbosity int32   info log verbosity, higher value produces more output
--version           version for kind

Use "kind [command] --help" for more information about a command.

I am on Arch Linux.

@mohclips
Copy link

As i understand it, the project has never supported multi-node clusters (only single nodes) but the documentation should really clearly specify this so that we aren't spending a lot of time doing complex multi-node work to find it doesn't survive a reboot or restart of docker. #1689 (comment)

@hitosatish
Copy link

Was the "restart" functionality ever shipped? I am using version 0.14.0 and dont see "restart" option in help message.

@BenTheElder
Copy link
Member

Multi-node restart has major fixes and should hopefully work fine in v0.15.0 ~soon when we get a release cut. #1689 is still open for a different subset problem around multiple control plane nodes specifically, which needs further investigation.

You can test it now by using kind v0.14.0 with a recent image https://github.com/kubernetes-sigs/kind/pull/2874/files#diff-643b1e9d9e446aa30da4407354de0098f24c947ac985213a06f73188c3e8e3fcR21 or by installing kind from HEAD.

We're still working on one or two remaining unrelated fixes in flight before dropping another tagged release

Was the "restart" functionality ever shipped? I am using version 0.14.0 and dont see "restart" option in help message.

I think that's a misunderstanding, the functionality is if you restart docker / your host the cluster should restart (as in this issue), there's no need for a command, docker handles starting the containers again.

Experimental Podman support does not have this support and likely won't because it doesn't have the related functionality due to design differences in podman.

@BenTheElder
Copy link
Member

BenTheElder commented Aug 12, 2022

Same for:

I can't figure out a way to restart my cluster:

This issue was to track:

When docker restarts or stop/start (for any reason), the kind node containers remain stopped and aren't restarted properly. When I tried to run docker restart the cluster didn't start either.

And predates the limited podman support. which again is limited by lack of functionality in podman (proper hostname / domain name resolution for containers on the network, any way to specify container restart policy)

#2715 would be the best place to keep up with discussing manually triggering a restart, though I suspect most potential use cases for manually causing a restart (e.g. testing your application during disruptions) have better alternatives than restarting the cluster.

@BenTheElder
Copy link
Member

FTR: The latest releases should have clusters that come back up on docker restart, always, including multi node.

@arkel-s
Copy link

arkel-s commented Feb 3, 2023

On Kind with rootless podman, I don't have access to my cluster after I restart my computer either.
Is it supposed to be supported?
I'm using Kind 0.17.0 on Ubuntu 22.04, with podman 3.4.4

@aojea
Copy link
Contributor

aojea commented Feb 3, 2023

#2272

@BenTheElder
Copy link
Member

This issue predates podman support, which is still considered experimental due to this feature gap and other stability issues. #2272 has more context on podman reboot.

@thawkins
Copy link

thawkins commented Mar 3, 2023

along with the "restart" functionality can we not also have a "suspend" capability which serializes the cluster and its state to what is effectively a hibernation file. That seems to be more effective than trying to patch up a cluster that may have had the rug pulled from under its feet.

The serialization could be placed in a directory like ~/.local/share/containers/kind

Velero backup can do this it saves the running state of not just the cluster, but the workloads running on it. It may work as a workaround...

https://velero.io/

@BenTheElder
Copy link
Member

BenTheElder commented Mar 3, 2023

#2715 is the issue for stopping / starting. this issue is complete and closed issues are not closely monitored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests