Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling issue for Java pods with auto-instumentation method #3730

Open
Vaibhav-1995 opened this issue Nov 29, 2024 · 7 comments
Open

Profiling issue for Java pods with auto-instumentation method #3730

Vaibhav-1995 opened this issue Nov 29, 2024 · 7 comments

Comments

@Vaibhav-1995
Copy link

Describe the bug

I am using Java Profiling with Alloy (auto-instrumentation method) for enabling profiling on java pods within cluster. Deployed pyroscope and alloy separately using helm chart and have added below config in alloy configmap for java profiling as provided on below link -

https://github.com/grafana/pyroscope/tree/main/examples/grafana-agent-auto-instrumentation/java/kubernetes

But profiling starts on only few random java pods and not on all java pods. Not able to identify that why profiling is not enabled on all java pods.

Expected behavior

As per documentation on below link all pre-requisites are done at alloy end in helm chart - so not seems issue from alloy end as some pods starts profiling as well and that data is visible in grafana

https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.java/

So expected behaviour is that all java pods should start profiling upon adding above config in alloy configmap.

Environment

  • Infrastructure: Kubernetes EKS
  • Deployment tool: helm

Additional Context

content: |

  logging {
    level  = "debug"
    format = "logfmt"
  }

  // Discovers all kubernetes pods.
  // Relies on serviceAccountName=grafana-alloy in the pod spec for permissions.
  discovery.kubernetes "pods" {
    role = "pod"
  }

  // Discovers all processes running on the node.
  // Relies on a security context with elevated permissions for the alloy container (running as root).
  // Relies on hostPID=true on the pod spec, to be able to see processes from other pods.
  discovery.process "all" {
    // Merges kubernetes and process data (using container_id), to attach kubernetes labels to discovered processes.
    join = discovery.kubernetes.pods.targets
  }
  // Drops non-java processes and adjusts labels.    
  discovery.relabel "java" {
    targets = discovery.process.all.targets
    // Drops non-java processes.
    rule {
      source_labels = ["__meta_process_exe"]
      action = "keep"
      regex = ".*/java$"
    }
    // Sets up the service_name using the namespace and container names.
    rule {
      source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
      target_label = "service_name"
      separator = "/"
    }
    // Sets up kubernetes labels (labels with the __ prefix are ultimately dropped).
    rule {
      action = "replace"
      source_labels = ["__meta_kubernetes_pod_node_name"]
      target_label = "node"
    }
    rule {
      action = "replace"
      source_labels = ["__meta_kubernetes_namespace"]
      target_label = "namespace"
    }
    rule {
      action = "replace"
      source_labels = ["__meta_kubernetes_pod_name"]
      target_label = "pod"
    }
    rule {
      action = "replace"
      source_labels = ["__meta_kubernetes_pod_container_name"]
      target_label = "container"
    }
    // Sets up the cluster label.
    // Relies on a pod-level annotation with the "cluster_name" name.
    // Alternatively it can be set up using external_labels in pyroscope.write. 
    rule {
      action = "replace"
      source_labels = ["__meta_kubernetes_pod_annotation_cluster_name"]
      target_label = "cluster"
    }
  }

  // Attaches the Pyroscope profiler to the processes returned by the discovery.relabel component.
  // Relies on a security context with elevated permissions for the alloy container (running as root).
  // Relies on hostPID=true on the pod spec, to be able to access processes from other pods.
  pyroscope.java "java" {
    profiling_config {
      interval = "15s"
      alloc = "512k"
      cpu = true
      lock = "10ms"
      sample_rate = 100
    }
    forward_to = [pyroscope.write.local.receiver]
    targets = discovery.relabel.java.output
  }
    
  pyroscope.write "local" {
    // Send metrics to the locally running Pyroscope instance.
    endpoint {
      url = "http://xxx-xxx-pyroscope-distributor.observability-pyroscope-dev.svc.cluster.local:4040"
    }
    external_labels = {
      "static_label" = "static_label_value",
    }
  }
@korniltsev
Copy link
Collaborator

Please provide the following:

  • attach alloy logs from a node.
  • specify which pods are profiled and which pods are not profiled but expected to be profiled
  • base docker image for the failing pods and/or JVM version and vendor

@Vaibhav-1995
Copy link
Author

Vaibhav-1995 commented Dec 2, 2024

Hi @korniltsev
Thanks for your reply.

PFB details :

  • attach alloy logs from a node - below are two main error logs reflecting in alloy pod

  • ts=2024-12-02T09:30:02.574990354Z level=error component_path=/ component_id=pyroscope.java.java pid=4118021 err="failed to reset: failed to read jfr file: open /proc/4118021/root/tmp/asprof-186018-4118021.jfr: no such file or directory"

  • ts=2024-12-02T04:58:02.01712108Z level=error component_path=/ component_id=pyroscope.java.java pid=716979 err="failed to start: asprof failed to run: asprof failed to run /tmp/alloy-asprof-glibc-ed25bbf0083bff602254601eb6c4a927823d988f/bin/asprof: exit status 255 Target JVM failed to load /tmp/alloy-asprof-glibc-ed25bbf0083bff602254601eb6c4a927823d988f/bin/../lib/libasyncProfiler.so\n"

  • specify which pods are profiled and which pods are not profiled but expected to be profiled

  • Basically pods of opensource components like Openmetadata, Clickhouse Zookeeper and Trino which are java based are profiled but the Custom Java Applications pods are not profiled

  • base docker image for the failing pods and/or JVM version and vendor

  • Base image used to build custom java applications is Red Hat Universal Base Image 9 for JDK 11 & 17

@Vaibhav-1995
Copy link
Author

Hi @korniltsev
Any update?

@korniltsev
Copy link
Collaborator

I'm sorry I did not have time to look into this yet. I may have time to look into this next week.

CC @aleks-p just in case :) feel free to look in to this as well if you want to.

@Vaibhav-1995
Copy link
Author

Hi Team,
Any update? Got stuck on this.

@Vaibhav-1995
Copy link
Author

I'm sorry I did not have time to look into this yet. I may have time to look into this next week.

CC @aleks-p just in case :) feel free to look in to this as well if you want to.

HI @korniltsev @aleks-p
Could you please update if any?

@Vaibhav-1995
Copy link
Author

Hi,

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants