Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

Potential Memory Leak #168

Open
TylerPachal opened this issue Oct 8, 2020 · 3 comments
Open

Potential Memory Leak #168

TylerPachal opened this issue Oct 8, 2020 · 3 comments

Comments

@TylerPachal
Copy link

I have an Elixir project where I am using opencensus-erlang for the first time. I have a helper function for wrapping pieces of my code in spans:

def span(name, attributes, func) do
  parent_span_ctx = :ocp.current_span_ctx()

  new_span_ctx =
    :oc_trace.start_span(name, parent_span_ctx, %{
      :attributes => attributes
    })

  :ocp.with_span_ctx(new_span_ctx)

  try do
    func.()
  after
    :oc_trace.finish_span(new_span_ctx)
    :ocp.with_span_ctx(parent_span_ctx)
  end
end

To talk to my collector I am using the opencensus_service library.

In my application code, I have a layout that is something like this, with a parent span that has some children (which may have children themselves):

span("span_a", attributes, fn ->
  span("span_b", attributes, fn ->
    span("span_c", attributes, fn -> :ok end)
    span("span_d", attributes, fn -> :ok end)
  end)
end)

When running this code, my application runs out of memory very quickly, and the offending process is oc_reporter. I was able to replicate this locally and watch the memory usage with Observer. The memory usage is mainly binary:

Memory Usage
Processes

I thought maybe that I am sending too much data, but after poking around in Observer I don't see the ETS tables growing unbounded or queues sizes increasing, so I don't think that is the cause. When running locally I had a Collector setup that was just writing to file. In production our Collector publishes to Kafka.

I also tried forcing some garbage collections and hibernating to see if there was any memory I could record that way, but that didn't help at all either.

I am now stuck and don't really know where to go from here:

  1. Either I am using this library incorrectly, or
  2. There is a memory leak somewhere that occurs under heavy usage caused by some piece slowing down.
@tsloughter
Copy link
Member

Hm, I'm surprised it isn't helped by manually running garbage collection since it is a binary leak.

Another hack you could try is to kill the reporter and see if the garbage is cleaned up.

If you are only using tracing then my suggestion would be to move to OpenTelemetry https://github.com/open-telemetry/opentelemetry-erlang -- I reworked how the reporter works in such a way that garbage should be less likely to accumulate because a new process is used for each report run and then discarded.

But note, there are 2 PRs that will change how it works, we are nearing GA and will stop breaking the API soon, but they don't change how the Elixir with_span macro works which I think you would want to use as they are essentially the same as the function you wrote to wrap OpenCensus.

If an issue with changing is that you already have the OpenCensus collector in use in your environment so can't simply change to the OpenTelemetry collector then I can get a version of the Erlang OC reporter ported over to OpenTelemetry.

@TylerPachal
Copy link
Author

TylerPachal commented Oct 8, 2020

Thanks for the quick comment!

... but they don't change how the Elixir with_span macro works which I think you would want to use ...

Where is that coming from? Is there a different dependency I should be using in my Elixir project?

@tsloughter
Copy link
Member

I was referring to OpenTelemetry, not OpenCensus. In the OpenTelemtry API there is an Elixir API https://github.com/open-telemetry/opentelemetry-erlang/blob/master/apps/opentelemetry_api/lib/open_telemetry/tracer.ex

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants