Skip to content

[tmpnet] Enable installation of chaos mesh to local kind cluster #3674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 22, 2025

Conversation

maru-ava
Copy link
Contributor

@maru-ava maru-ava commented Jan 25, 2025

Why this should be merged

Deploying chaos mesh to a local kube cluster is intended to simplify local experimentation with fault injection testing.

How this works

  • Optionally deploys chaos mesh by starting a kind cluster with tmpnetctl start-kind-cluster --install-chaos-mesh

How this was tested

  • CI against regression
  • CI job (robustness) that validates deployment of kind with chaos mesh
  • Manually invoked tmpnetctl start-kind-cluster --install-chaos-mesh and ran a chaos experiment

Need to be documented in RELEASES.md?

N/A

TODO

@maru-ava maru-ava added the testing This primarily focuses on testing label Jan 25, 2025
@maru-ava maru-ava self-assigned this Jan 25, 2025
Copy link

github-actions bot commented Mar 2, 2025

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

@joshua-kim joshua-kim moved this from In Progress 🏗️ to Backlog 🧊 in avalanchego May 1, 2025
Copy link

github-actions bot commented Jun 1, 2025

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

@maru-ava maru-ava moved this from Backlog 🧊 to Ready 🚦 in avalanchego Jul 9, 2025
@maru-ava maru-ava force-pushed the install-chaos-mesh branch from edff22a to 402ad07 Compare July 9, 2025 02:40
@maru-ava maru-ava changed the base branch from reuse-kube-install to tmpnet-nginx-ingress July 9, 2025 02:40
@maru-ava maru-ava changed the title [testing] Install chaos mesh in local kind cluster [tmpnet] Enable tmpnetctl start-kind-cluster --install-chaos-mesh Jul 9, 2025
@maru-ava maru-ava changed the title [tmpnet] Enable tmpnetctl start-kind-cluster --install-chaos-mesh [tmpnet] Enable installation of chaos mesh to local kind cluster Jul 9, 2025
@maru-ava maru-ava force-pushed the install-chaos-mesh branch from 402ad07 to 4610eb8 Compare July 9, 2025 02:41
@maru-ava maru-ava requested a review from Copilot July 9, 2025 02:41
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an optional flag to install Chaos Mesh into a local kind cluster, updates existing ingress controller port usage to a shared constant, and integrates Chaos Mesh deployment logic with Helm and readiness checks.

  • Introduces --install-chaos-mesh flag and installChaosMesh parameter
  • Adds Chaos Mesh constants, deployChaosMesh, isChaosMeshRunning, and waitForChaosMesh functions
  • Refactors ingress port settings to use ingressNodePort constant and updates the kind script comment

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/fixture/tmpnet/tmpnetctl/main.go Adds installChaosMesh flag and passes it through to StartKindCluster
tests/fixture/tmpnet/start_kind_cluster.go Defines Chaos Mesh constants, handles installChaosMesh branch, and implements Helm deployment and readiness polling
scripts/kind-with-registry.sh Updates comment to reference the ingressNodePort constant
Comments suppressed due to low confidence (2)

tests/fixture/tmpnet/start_kind_cluster.go:131

  • No existing tests cover the installChaosMesh branch; consider adding a unit or integration test to verify that Chaos Mesh is deployed when the flag is set.
	if installChaosMesh {

tests/fixture/tmpnet/start_kind_cluster.go:58

  • Accessing the dashboard at chaos-mesh.localhost requires a hosts entry (e.g., /etc/hosts); please document this requirement in the README or RELEASES.md.
	chaosMeshDashboardHost  = "chaos-mesh.localhost"

@maru-ava maru-ava force-pushed the install-chaos-mesh branch 2 times, most recently from e78a019 to 0b90032 Compare July 9, 2025 02:51
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from e2231cd to 7991525 Compare July 18, 2025 05:18
@maru-ava maru-ava force-pushed the install-chaos-mesh branch from 0b90032 to f565e8f Compare July 18, 2025 05:20
@maru-ava maru-ava force-pushed the tmpnet-nginx-ingress branch from 7991525 to b9f3df7 Compare July 18, 2025 16:41
Base automatically changed from tmpnet-nginx-ingress to master July 21, 2025 16:24
@maru-ava maru-ava force-pushed the install-chaos-mesh branch from f565e8f to 7b13cb9 Compare July 21, 2025 17:07
@maru-ava maru-ava marked this pull request as ready for review July 21, 2025 17:11
@RodrigoVillar RodrigoVillar self-requested a review July 21, 2025 17:13
@maru-ava
Copy link
Contributor Author

Updated to use a longer timeout - chaos mesh deployment on top of kind and nginx deployment can take more than 2 minutes.

@maru-ava
Copy link
Contributor Author

Updated to add a CI check of deployment

@maru-ava maru-ava force-pushed the install-chaos-mesh branch from 5239c5c to a6e9a37 Compare July 21, 2025 20:36
Copy link
Contributor

@RodrigoVillar RodrigoVillar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - also ran load test with chaos mesh which worked 👍

@Elvis339
Copy link
Contributor

I'm curious about the architectural decision to have Kind cluster setup entirely in Go code rather than using a more declarative approach with YAML manifests, Helm charts, and scripts.

My initial thoughts from a development workflow perspective, a declarative approach might offer some advantages like easier configuration where developers could modify YAML files without recompiling Go code or understanding the tmpnet

For example, something like:

# scripts/setup-dev-cluster.sh
./scripts/kind-with-registry.sh
kubectl apply -k {{path}}
helm install ingress-nginx -f values/ingress.yaml
helm install chaos-mesh -f values/chaos-mesh.yaml --create-namespace
kubectl wait --for=condition=available deployment/ingress-nginx-controller

However, I recognize there might be good reasons for the Go approach, just curios to understand what drove to having this programatic rather than declarative.

@maru-ava
Copy link
Contributor Author

I'm curious about the architectural decision to have Kind cluster setup entirely in Go code rather than using a more declarative approach with YAML manifests, Helm charts, and scripts.

My initial thoughts from a development workflow perspective, a declarative approach might offer some advantages like easier configuration where developers could modify YAML files without recompiling Go code or understanding the tmpnet

For example, something like:

# scripts/setup-dev-cluster.sh
./scripts/kind-with-registry.sh
kubectl apply -k {{path}}
helm install ingress-nginx -f values/ingress.yaml
helm install chaos-mesh -f values/chaos-mesh.yaml --create-namespace
kubectl wait --for=condition=available deployment/ingress-nginx-controller

However, I recognize there might be good reasons for the Go approach, just curios to understand what drove to having this programatic rather than declarative.

What you're suggesting would involve replacing logic in golang with logic in a shell script. Sure, the golden path looks nice, but ensuring everything is logged and error checked ends up being more complicated and less reliable than what is possible with golang.

I'm also not clear why 'recompiling golang code' would be an issue? ./bin/tmpnetctl already does this automatically, and even that invocation can be with tasks rather than directly.

What modifications are you envisioning exactly?

@maru-ava maru-ava added this pull request to the merge queue Jul 22, 2025
Merged via the queue into master with commit a43f346 Jul 22, 2025
30 checks passed
@maru-ava maru-ava deleted the install-chaos-mesh branch July 22, 2025 18:05
@github-project-automation github-project-automation bot moved this from Ready 🚦 to Done 🎉 in avalanchego Jul 22, 2025
@maru-ava
Copy link
Contributor Author

I'm curious about the architectural decision to have Kind cluster setup entirely in Go code rather than using a more declarative approach with YAML manifests, Helm charts, and scripts.

  • Note also that yaml manifests are used for some deployment tasks (e.g. monitoring and rbac).
  • Another motivation for golang was simplifying reuse across repos (subnet-evm, hypersdk) since golang code can be vendored with go modules but scripts cannot. This appears less and less important over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing This primarily focuses on testing
Projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

5 participants