Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network policy for RKE2 Ingress Nginx #4300

Open
wants to merge 6 commits into
base: dev-v2.10
Choose a base branch
from

Conversation

dharmit
Copy link
Contributor

@dharmit dharmit commented Jul 30, 2024

Issue:

Addresses issue raised in rancher/rancher#45603.

Problem

Prometheus target failing to come up

Solution

Using the modified chart in this PR enables the target to come up

Testing

  • Install rancher-monitoring-crd through this PR
  • Install rancher-monitoring using following Helm command:
    helm install rancher-monitoring charts/rancher-monitoring/105.0.0-rc1+up57.0.3/ \
    --namespace=cattle-monitoring-system \
    --set alertmanager.alertmanagerSpec.configSecret=alertmanager-rancher-monitoring-alertmanager \
    --set alertmanager.alertmanagerSpec.useExistingSecret=true \
    --set global.cattle.clusterId=local \
    --set global.cattle.clusterName=local \
    --set global.cattle.rkePathPrefix="" \
    --set global.cattle.rkeWindowsPathPrefix="" \
    --set global.cattle.systemDefaultRegistry="" \
    --set global.cattle.systemProjectId=p-cwh9d \
    --set global.cattle.url=https://13.202.137.42.sslip.io \
    --set global.systemDefaultRegistry="" \
    --set prometheus.prometheusSpec.evaluationInterval=1m \
    --set prometheus.prometheusSpec.retentionSize=50GiB \
    --set prometheus.prometheusSpec.scrapeInterval=1m \
    --set rke2ControllerManager.enabled=true \
    --set rke2Etcd.enabled=true \
    --set rke2IngressNginx.enabled=true \
    --set rke2IngressNginx.networkPolicy.enabled=true \
    --set rke2Proxy.enabled=true \
    --set rke2Scheduler.enabled=true
  • Go to Rancher UI; Monitoring; Prometheus Targets
  • In the Prometheus UI search for the ServiceMonitor serviceMonitor/kube-system/rancher-monitoring-ingress-nginx/0. It should be UP.

Engineering Testing

Manual Testing

Automated Testing

QA Testing Considerations

Regressions Considerations

Backporting considerations

@dharmit dharmit requested a review from a team as a code owner July 30, 2024 13:48
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Jul 30, 2024

I have honestly got no clue about why there are so many lines modified in this PR. Steps I followed are below:

$ export PACKAGE=rancher-monitoring/rancher-monitoring
$ make prepare
// made changes
$ make patch
$ make charts

Did I do something wrong here? 🤔

Copy link
Contributor

@alexandreLamarre alexandreLamarre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rancher monitoring version needs to be bumped to 104.1.1-rc1. Recommend separating make patch into a separate commit from make charts for exactly this scenario, where you would to do something like rebase with drop.

Need to remove packages/rancher-monitoring/rancher-monitoring/generated-changes/overlay/crds as a bug in charts-build-scripts includes these by accident.

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Jul 31, 2024

@joshmeranda @alexandreLamarre thanks for your feedback, guys! I have modified as per your recommendations. Can you PTAL again? Thanks.

Copy link
Contributor

@joshmeranda joshmeranda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better, but git history could use some housekeeping. Can you update the commits to look closer to the steps outline here? Generally this will look like this:

  1. export PACKAGE=rancher-monitoring/rancher-monitoring
  2. make prepare
  3. Make the changes you need to the prepare chart, in this case adding the network policy
  4. make patch
  5. Commit the changes to packages/rancher-monitoring/rancher-monitoring/generated-changes (don't forget to remove generated-changes/overlay/crds introduced by the bug)
  6. Update the version in package.yaml
  7. Commit changes to package.yaml
  8. make charts
  9. Commit changes made by make charts
  10. Update and commit release.yaml with new version of chart

Following these steps will make it a good bit easier for the reviewer to see what was changed and why. Also make each commit smaller and prevents new chart versions from stalling our browsers when trying to load 100+ new files. Thanks!

@dharmit
Copy link
Contributor Author

dharmit commented Jul 31, 2024

@joshmeranda I referred to the Making Changes To Packages section which mentions making version related changes first. In context of your comment, I followed steps 6-9 first and 1-5 after that.

If I follow the steps you suggested, it'd make changes to 104.1.0. And, AFAIK, 104.1.0 has already released, right? Correct me if I'm wrong because there's a good chance I am. :)

@joshmeranda
Copy link
Contributor

joshmeranda commented Jul 31, 2024

@dharmit You're right that we don't want to make changes to 104.1.0 but since you won't run make charts until after you bump the the version in step 6/7 you shouldn't be updating any existing charts.

I keep forgetting that the docs recommend you update the version first. That's fine as long as its a separate commit. Generally I'll update package.yaml in its own commit since its easier to see the new version in a commit with 1 changed file rather than 100+ changes files, then run make charts at the end. This will still create the new charts at versions 104.1.1 since package.yaml will have the bumped version.

index.yaml Outdated
- name: Chart Source
url: https://github.com/prometheus-community/helm-charts
- name: Upstream Project
url: https://github.com/prometheus-operator/kube-prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra space before url:. Strictly speaking it's not a yaml error, as this block is a string, but it looks like it's intended to be input for some other yaml process, and therefore should be conforming yaml code

Copy link
Contributor

@ericpromislow ericpromislow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something in this process is generating broken yaml.

@dharmit
Copy link
Contributor Author

dharmit commented Aug 2, 2024

I keep forgetting that the docs recommend you update the version first. That's fine as long as its a separate commit. Generally I'll update package.yaml in its own commit since its easier to see the new version in a commit with 1 changed file rather than 100+ changes files, then run make charts at the end. This will still create the new charts at versions 104.1.1 since package.yaml will have the bumped version.

I see your point. However, what I have done is again just as per the docs recommendation. It recommends updating the version in package.yaml, then doing make charts, and committing the changes.

I can totally do what you're recommending if that's the general practice followed by O&B team. Let me know.

@joshmeranda
Copy link
Contributor

joshmeranda commented Aug 2, 2024

@dharmit Yeah that's totally fine to do it that way, its just less familiar. It makes it easier to ensure that the changes you made were brought into the generated chart. Feel free to continue!

But you'll still want to have separate commits for updating package.yaml, building the new charts at that version, making the necessary changes, and rebuilding the charts. The commit history for something like this pr should probably look like this:

  1. Commit 1: bump rancher-monitoring/rancher-monitoring version to 104.1.1+up57.0.3
  2. Commit 2: make charts
  3. Commit 3: update release.yaml (can go anywhere as long as its updated`
  4. Commit 4: Add network policy for RKE2 Nginx Ingress
  5. Commit 5: 'make charts'

I get that this feels somewhat nitpicky, but because of the sheer amount of files that are updated in charts PRs it makes it much easier for reviewers to give a review.

Copy link

github-actions bot commented Aug 2, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Aug 2, 2024

I get that this feels somewhat nitpicky, but because of the sheer amount of files that are updated in charts PRs it makes it much easier for reviewers to give a review.

No, it doesn't sound nitpicky at all, TBH. Thanks a tonne for laying down the steps. I'll update the PR accordingly. 👍🏽

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

release.yaml Outdated
fleet:
- 104.0.1+up0.10.1-rc.1
fleet-agent:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid adding references to charts that weren't updated in this PR. Remove the fleet-agent and rancher-vsphere-csi fields please. Otherwise LGTM!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, those changes are based on make check-release-yaml.

Copy link
Contributor

@joshmeranda joshmeranda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see release.yaml comments above

@mallardduck
Copy link
Member

mallardduck commented Aug 13, 2024

I don't think you need my review here -- reviews from Josh and Alexandre are sufficient.

Screenshot 2024-08-13 at 9 35 29 AM

@ericpromislow - I think that because you left a "Requested Changes" review you must provide an "Approved" review before this PR will be passable. Ignore mine since I added them today intentionally for changes I suggested. But notice your's and alex...I think big picture we need a new PR even after mine and yours are approved since Alex can review currently.

@joshmeranda
Copy link
Contributor

Seeing this upon install:

W0813 13:11:43.161908 44 warnings.go:70] spec.template.spec.containers[2].ports[0]: duplicate port definition with spec.template.spec.containers[1].ports[0]

@mallardduck This one is coming from upstream: https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/_pod.tpl#L1016-L1021

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

release.yaml Outdated Show resolved Hide resolved
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Aug 16, 2024

Not quite working out the box for me when I set to enabled. In the rancher installed apps section it thinks that it has network policies, but they 404 when I click their links.

@mallardduck IIUC (which is based on description in rancher/rancher#45603), you have to open Prometheus Targets from the Monitoring view and check if serviceMonitor/kube-system/rancher-monitoring-ingress-nginx is up. I'm not sure where in the Installed Apps would you see a Network Policy because I see none on my RKE2 setup. Also, please set up an RKE2 with CIS enabled. Below are the configuration files I used:

$ cat /etc/rancher/rke2/config.yaml
cni: calico
write-kubeconfig-mode: "0644"
profile: cis
pod-security-admission-config-file: /etc/rancher/rke2/rke2-custom-pss.yaml

$ cat /etc/rancher/rke2/rke2-custom-pss.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1beta1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "restricted"
      enforce-version: "latest"
      audit: "restricted"
      audit-version: "latest"
      warn: "restricted"
      warn-version: "latest"
    exemptions:
      usernames: []
      runtimeClasses: []
      namespaces: [kube-system, cis-operator-system, tigera-operator, cattle-system, cattle-fleet-system, cattle-fleet-local-system, cattle-monitoring-system]

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

1 similar comment
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Sep 2, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@mallardduck
Copy link
Member

mallardduck commented Sep 25, 2024

Edit: Disregard original message for now - Prachi can dismiss the blocking review.
Will review the PR first thing tomorrow and review again.

@dharmit
Copy link
Contributor Author

dharmit commented Sep 26, 2024

Edit: Disregard original message for now - Prachi can dismiss the blocking review.

TBH, I have been backing up old branch and force pushing each time. So, in a sense, this PR keeps getting fresh each time. 😉

Will review the PR first thing tomorrow and review again.

Before you spend time reviewing, I want to get the thing about RC clarified. Last time I pinged you on Slack, we agreed upon following steps mentioned in #4404 (comment) for this PR. However, a recent Slack thread started by @joshmeranda has got me concluding that I don't need to do those and only have a PR with the actual changes. Did I get this right?

@mallardduck
Copy link
Member

I don't need to do those and only have a PR with the actual changes. Did I get this right?

Yes that's correct - with the clarification we got from @joshmeranda, there's no need for this to bump RC.1 to RC.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants