Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network policies #56

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft

Add network policies #56

wants to merge 21 commits into from

Conversation

nsklikas
Copy link
Contributor

@nsklikas nsklikas commented Apr 19, 2023

Apply network policies to deny ingress traffic to the Kratos APIs when it isn't coming from the traefik ingress.

I had to open all the none Kratos ports as I couldn't figure out exactly what is needed by juju/pebble. (need to change that).

To test these changes you need to:

# Pack the charm
$ charmcraft pack
# Deploy it 
$ juju deploy ./kratos_ubuntu-*-amd64.charm --resource oci-image=$(yq eval '.resources.oci-image.upstream-source' metadata.yaml) --trust
# Deploy other charms needed for the test 
$ juju deploy postgresql-k8s --channel edge --trust
$ juju deploy traefik-k8s traefik-admin

# Create relations
$ juju integrate kratos:admin-ingress traefik-k8s
$ juju integrate kratos postgresql-k8s

# Wait for the charms to go to idle
$ juju status

I am first creating the traefik relation and then the postgres one because I had some trouble with dns issues that where causing the migration to fail. First relating traefik and then postgres assures us that those issues are gone.

You must be able to see the network policy by running:

$ microk8s kubectl get -n {namespace} networkpolicies

To validate the networkpolicies you can juju ssh to a random pod (e.g. postgres and try to reach the kratos admin API):

$ juju ssh postgres/0 bash
$ curl {kratos_app_ip}:4434

This should timeout. To get the kratos app ip simply run juju status.
By doing the same test from the traefik pod, the http request must return.

@nsklikas nsklikas requested a review from a team as a code owner April 19, 2023 13:50
@nsklikas nsklikas marked this pull request as draft April 19, 2023 13:51
@nsklikas nsklikas force-pushed the IAM-217-network-policies branch 2 times, most recently from c237036 to 4f189f6 Compare April 20, 2023 09:06
This is fairly complex, will probably have to refactor it
@nsklikas nsklikas marked this pull request as ready for review April 20, 2023 10:12
@nsklikas nsklikas force-pushed the IAM-217-network-policies branch 3 times, most recently from 1afc5cf to 2a4f782 Compare April 21, 2023 07:42
"""The default policy name that will be created."""
return f"{self._charm.app.name}-network-policy"

def apply_ingress_policy(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect a list of policies so we should probably call the method apply_ingress_policies

logger.error(msg)
raise NetworkPoliciesHandlerError()

def delete_network_policy(self, name: Optional[str] = None) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should standardise on naming, we use ingress policies and network policies interchangeably.

src/charm.py Outdated
]
)
except NetworkPoliciesHandlerError:
event.defer()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we simply change the status to blocked (or not catch the error and let the charm go to error state)? If something is broken we may not want to be deferring forever

lib/charms/kratos/v0/kubernetes_network_policies.py Outdated Show resolved Hide resolved
lib/charms/kratos/v0/kubernetes_network_policies.py Outdated Show resolved Hide resolved
lib/charms/kratos/v0/kubernetes_network_policies.py Outdated Show resolved Hide resolved
@nsklikas nsklikas marked this pull request as draft April 24, 2023 10:52
We can't use the named port because we have named the ports on the
Service and not on the Pod
@nsklikas nsklikas marked this pull request as ready for review April 25, 2023 14:56

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the unit tests for this file 👀



class NetworkPoliciesHandlerError(Exception):
"""Applying the network policies failed."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Applying the network policies failed."""
"""Raised when applying the network policies failed."""

@@ -442,6 +449,9 @@ def _on_config_changed(self, event: ConfigChangedEvent) -> None:
"""Event Handler for config changed event."""
self._handle_status_update_config(event)

def _cleanup(self, event: RemoveEvent) -> None:
self.network_policy_handler.delete_ingress_policies()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a juju bug, but when I try removing kratos it gets into terminated state with unit lost. When I use --force --no-wait, the charm is removed but in both cases the network policy is not deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I tried testing it a little more and when I test it manually, the policy is removed. I wrote an integration test to validate this and in the test the policy is not removed. This is very weird.

Copy link
Contributor Author

@nsklikas nsklikas Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell the stop event is never fired, this is most likely a juju bug and not something we can control. Let's leave this as it is for now, it is the best we can do and juju should (hopefully) fix it in a later version.

def apply_ingress_policies(
self, policies: List[IngressPolicyDefinition], name: Optional[str] = None
) -> None:
"""Apply an ingress network policy about a related application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Apply an ingress network policy about a related application.
"""Apply an ingress network policy for a related application.

@@ -0,0 +1,14 @@
#!/usr/bin/env python3
# Copyright 2022 Canonical Ltd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2022 Canonical Ltd.
# Copyright 2023 Canonical Ltd.

(38813, []),
]
)
except NetworkPoliciesHandlerError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we retry applying the network policy?
Right now if you deploy kratos without trust, it gets into BlockedStatus, but if you trust it then it gets into active without creating the network policy.

@nsklikas
Copy link
Contributor Author

In case applying the network policies fails we have several options:

  1. Log the error and keep the normal operation, if the user wants they can deploy the charm (or re trigger the hook)
  2. Go to a blocked state, this would make state management much more complicated
  3. Raise an exception
  4. Log an error and defer the event until patching succeeds

Since it is not needed for the charm operation and it is not essential for security I think that (1) and (4) make most sense because they allow the charm to run despite any error that might occur and the user can make the decision to stop the charm if they deem it necessary.

If we decide to go with (1) we can also provide an action for applying the network policies to allow the user to retry applying them if something goes wrong. cc @natalian98 @gruyaume

@nsklikas nsklikas marked this pull request as draft May 23, 2023 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants