BUG: dapr-control-plane OOMKilled when DaprInstance provisioned #135

ryorke1 · 2024-04-18T13:58:44Z

Expected Behavior

dapr-control-plane pod should remain stable and have configurable resource limits and requests.

Current Behavior

The dapr-control-plane pod is continuously being OOMKIlled as long as there is a DaprInstance created. If we remove the DaprInstance, the pod stablizes. The dapr-control-plane pod does seem to survive long enough to deploy the DaprInstance pods and CRDs but it takes a few OOMKills to complete. The pod still continues to crash but doesn't seem to affect the Dapr components.

Possible Solution

Increase the resource limits to 512Mi (Memory) and 1000m (CPU)
Make the resource limits and request configurable

Steps to Reproduce

Uninstall any previous version of Dapr Operator (including cleaning up all CRDs and CRs)
Install Dapr Operator 0.0.8 (at this point the dapr-control-plane will start and is stable)
Create a new DaprInstance with the following configuration (see below)
Monitor the pods and watch the dapr-control-plane pod get OOMKilled

# DaprInstance 
apiVersion: operator.dapr.io/v1alpha1
kind: DaprInstance
metadata:
  name: dapr-instance
  namespace: openshift-operators
spec:
  values:
    dapr_operator:
      livenessProbe:
        initialDelaySeconds: 10
      readinessProbe:
        initialDelaySeconds: 10
    dapr_placement:
      cluster:
        forceInMemoryLog: true
    global:
      imagePullSecrets: dapr-pull-secret
      registry: internal-repo/daprio
  chart:
    version: 1.13.2

Environment

OpenShift: RedHad OpenShift Container Platform 4.12
Dapr Operator: 0.0.8 with 1.13.2 Dapr components

The text was updated successfully, but these errors were encountered:

lburgazzoli · 2024-04-18T14:12:46Z

To change the resource la request and limits, the only option is to tweak the subscription #77 (comment)

unfortunately the memory cannot be made configurable but I will digg into the memory consumption.

Do you have a way to reproduce it ? I never experienced such behavior

ryorke1 · 2024-04-18T15:06:04Z

All we did was execute the steps above and that reproduced it. I don't think the dapr-control-plane would be affected by any existing pods that had dapr annotations for sidecar injection but maybe you can correct me if I am wrong. We did have a number of pods running that had the annotations during the initialization of the DaprInstance.

Do you have an example of how we could use the subscription to tweak the requests and limits in the context of the dapr-control-plane? Or am I mistaken about what you mean?

lburgazzoli · 2024-04-18T16:05:13Z

All we did was execute the steps above and that reproduced it. I don't think the dapr-control-plane would be affected by any existing pods that had dapr annotations for sidecar injection but maybe you can correct me if I am wrong. We did have a number of pods running that had the annotations during the initialization of the DaprInstance.

it should not as the one that is affected is the dapr-operator and other resources. The dapr control plane only generates the manifest. Maybe the watcher watches too many objects. I'll have a look.

Do you have an example of how we could use the subscription to tweak the requests and limits in the context of the dapr-control-plane? Or am I mistaken about what you mean?

No, I don't but there are a number of examples in the documentation mentioned in the linked comment.

lburgazzoli · 2024-04-19T07:58:16Z

I've tried to reproduce the issue but I've failed.
What I did is:

delete any trace of the dapr-kubernetes-operator
reinstall the operator
deploy a DaprInstance resource similar to the one you provided (excetpt the registry)

But the operator works as expected and does not get OOMKilled:

➜ k get pods -l control-plane=dapr-control-plane -w
NAME                                  READY   STATUS    RESTARTS   AGE
dapr-control-plane-7796c9ff85-htk4g   1/1     Running   0          2m49s

➜ k top pod dapr-control-plane-7796c9ff85-htk4g    
NAME                                  CPU(cores)   MEMORY(bytes)   
dapr-control-plane-7796c9ff85-htk4g   7m           68Mi

I don't have any dapr application running so it is not 100% the same test, but for what concern the dapr-kubernetes-operator, it should not matter.

ryorke1 · 2024-04-19T13:38:00Z

OK we are going to look into the OLM and see if we can adjust the resources of the dapr-control-plane. While we are doing that, I am curious to know if the dapr-control-plane being killed will cause any issues? IN our case, so far we do see the components in places and the CRDS were deployed (permission issues still exists #136 ) and we are using the dapr components so far without issues. What's your thoughts on this?

ryorke1 · 2024-04-19T13:39:03Z

Also, was finally able to capture a screenshot of this crash (it goes OOMKilled and then immediately into CrashBackoffLoop so hard to capture as well).

ryorke1 · 2024-04-19T19:27:08Z

Some logs from OpenShift as well

lburgazzoli · 2024-04-19T19:45:00Z

OK we are going to look into the OLM and see if we can adjust the resources of the dapr-control-plane. While we are doing that, I am curious to know if the dapr-control-plane being killed will cause any issues? IN our case, so far we do see the components in places and the CRDS were deployed (permission issues still exists #136 ) and we are using the dapr components so far without issues. What's your thoughts on this?

It should jot cause any issue as the role of the operator ia just to setup dapr and be sure the setup is in sync with the DaprInstance spec

lburgazzoli · 2024-04-19T19:47:10Z

Some logs from OpenShift as well

are you able to provide a reproducer ? like by deploying a DaprInstance similar to your one does not trigger OOMKiller on my environment so I need something similar to your setup to digg onto it further

ryorke1 · 2024-04-22T17:10:01Z

Hi @lburgazzoli. Using subscriptions in OLM we were able to stabilize the dapr-control-plane pod. Here is the subscription we used for future reference if others run into this issue.

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/dapr-kubernetes-operator.openshift-operators: ""
  name: dapr-kubernetes-operator
  namespace: openshift-operators
spec:
  channel: alpha
  config:
    resources:
      limits:
        cpu: "1"
        memory: 512Mi
      requests:
        cpu: 250m
        memory: 256Mi
  installPlanApproval: Manual
  name: dapr-kubernetes-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
  startingCSV: dapr-kubernetes-operator.v0.0.8

As a side note, this did not resolve the propagation to the roles. We still need a admin to manually create roles for us to use these CRDs.

lburgazzoli · 2024-04-22T17:13:15Z

@ryorke1 I would really love to be able to reproduce it so I can fix the real problem (which maybe it is just about increasing the memory) so if at any point you have a sort of reproducer, please let me knoe

ryorke1 mentioned this issue Apr 19, 2024

BUG: CRDs in 0.0.8 with Dapr 1.13.2 not configuring properly #136

Open

ryorke1 closed this as completed Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: dapr-control-plane OOMKilled when DaprInstance provisioned #135

BUG: dapr-control-plane OOMKilled when DaprInstance provisioned #135

ryorke1 commented Apr 18, 2024

lburgazzoli commented Apr 18, 2024

ryorke1 commented Apr 18, 2024

lburgazzoli commented Apr 18, 2024

lburgazzoli commented Apr 19, 2024 •

edited

Loading

ryorke1 commented Apr 19, 2024

ryorke1 commented Apr 19, 2024

ryorke1 commented Apr 19, 2024

lburgazzoli commented Apr 19, 2024

lburgazzoli commented Apr 19, 2024

ryorke1 commented Apr 22, 2024

lburgazzoli commented Apr 22, 2024

BUG: dapr-control-plane OOMKilled when DaprInstance provisioned #135

BUG: dapr-control-plane OOMKilled when DaprInstance provisioned #135

Comments

ryorke1 commented Apr 18, 2024

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Environment

lburgazzoli commented Apr 18, 2024

ryorke1 commented Apr 18, 2024

lburgazzoli commented Apr 18, 2024

lburgazzoli commented Apr 19, 2024 • edited Loading

ryorke1 commented Apr 19, 2024

ryorke1 commented Apr 19, 2024

ryorke1 commented Apr 19, 2024

lburgazzoli commented Apr 19, 2024

lburgazzoli commented Apr 19, 2024

ryorke1 commented Apr 22, 2024

lburgazzoli commented Apr 22, 2024

lburgazzoli commented Apr 19, 2024 •

edited

Loading