Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During Kmesh daemon restart process, new connections will not go through waypoint #662

Closed
YaoZengzeng opened this issue Aug 1, 2024 · 4 comments
Labels
kind/bug Something isn't working

Comments

@YaoZengzeng
Copy link
Member

YaoZengzeng commented Aug 1, 2024

What happened:

I write E2E test case (#661 ) to make sure everything work fine before, during and after Kmesh daemon restart.

I check whether the managed application is processed normally by the Kmesh by checking whether the request go through waypoint.

However, according to the test result, during the restart of Kmesh daemon, traffic does go pass through waypoint.

What you expected to happen:

Before, during and after Kmesh daemon restart, all traffic should go through waypoint.

How to reproduce it (as minimally and precisely as possible):

Deploy bookinfo and service granularity waypoint for reviews, ref: https://kmesh.net/en/docs/userguide/try_waypoint/

Keep an eye on waypoint log:

kubectl logs -f reviews-svc-waypoint-6884756fc5-524dz

Create continuous traffic:

kubectl exec deploy/sleep -- sh -c "for i in \$(seq 1 10000); do curl -s http://productpage:9080/productpage | grep reviews-v.-; sleep 1; done"

At this time, you will see that the waypoint continues to output access log.

Simulate the restart process of Kmesh daemon by change the Kmesh image to an unavailable image.

kubectl edit daemonset kmesh  -n kmesh-system
...
   image: ghcr.io/kmesh-net/kmesh:latest-a
...

After that Kmesh pod has been in unavailable state of "ImagePullBackOff" and waypoint no longer output access logs, indicating that subsequent traffic no longer go through waypoint.

Anything else we need to know?:

Environment:

  • Kmesh version:
  • Others:
@YaoZengzeng YaoZengzeng added the kind/bug Something isn't working label Aug 1, 2024
@lec-bit
Copy link
Contributor

lec-bit commented Aug 1, 2024

This is most likely a problem with the kind environment used
In the K8S cluster environment set up by kind, the /sys/fs directory is read-only by default, and kmesh cannot operate the directory. Therefore, the bpf program is stored in the kmesh pod and is deleted during the restart.
If you go inside the node when setting up the k8s cluster and create a readable and writable bpf directory, you should be able to solve this problem.

docker exec -it ambient-worker /bin/bash
mount -t bpf none /sys/fs/bpf

@hzxuzhonghu
Copy link
Member

Please retest @YaoZengzeng

@YaoZengzeng
Copy link
Member Author

This is most likely a problem with the kind environment used In the K8S cluster environment set up by kind, the /sys/fs directory is read-only by default, and kmesh cannot operate the directory. Therefore, the bpf program is stored in the kmesh pod and is deleted during the restart. If you go inside the node when setting up the k8s cluster and create a readable and writable bpf directory, you should be able to solve this problem.

docker exec -it ambient-worker /bin/bash
mount -t bpf none /sys/fs/bpf

After doing this, it will work fine

@YaoZengzeng
Copy link
Member Author

Tested and verified, now closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants