Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate perfetto - supported by android native #655

Open
cforce opened this issue Oct 23, 2024 · 10 comments
Open

integrate perfetto - supported by android native #655

cforce opened this issue Oct 23, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@cforce
Copy link

cforce commented Oct 23, 2024

Perfetto allows you to collect system-wide performance traces from Android devices from a variety of data sources (kernel scheduler via ftrace, userspace instrumentation via atrace and all other data sources listed in this site).

Perfetto is based on platform services that are available since Android 9 (P) but are enabled by default only since Android 11 (R). On Android 9 (P) and 10 (Q) you need to do the following to ensure that the tracing services are enabled before getting started:

https://perfetto.dev/docs/quickstart/android-tracing

open-telemetry/opentelemetry-cpp#1483

@breedx-splk
Copy link
Contributor

Interesting idea. An Android "system trace" is not exactly something that fits cleanly into the otel data model today. Curious if you have thoughts about specifics, @cforce ?

@cforce
Copy link
Author

cforce commented Oct 23, 2024

It would consider potential solutions, such as:

  • Defining custom spans or events in OpenTelemetry to represent significant moments in system performance traces.
  • Integrating the Perfetto trace output into a separate telemetry pipeline that complements OpenTelemetry, rather than trying to directly merge them.
  • Exploring ways to extend OpenTelemetry's semantic conventions to include more granular system-level performance metrics.

Also consider that tools like strace or ftrace (used by Perfetto) are used since years and a lot of tools are around to provide data. A semantic model to map those is not isolated to android. Not to mention that eBPF similarly covers this areas even better and ther are solutions which might have some "mapping" already - see open-telemetry/community#2406

Android currently does not has any otel export - some kind of adapter is needed to collect data from native platform tooling which is backed into the kernel

@breedx-splk breedx-splk added the enhancement New feature or request label Oct 24, 2024
@marandaneto
Copy link
Member

You can probably leverage https://developer.android.com/jetpack/androidx/releases/tracing
and get inspiration from https://github.com/square/papa/tree/main/papa-safetrace
The data has to be transformed to the OTel protocol of course

@bherbst
Copy link

bherbst commented Nov 1, 2024

It isn't particularly fancy, but I've found a basic span processor that starts & ends system traces to be quite effective in getting OTel spans represented in tools like Perfetto:

@RequiresApi(29)
class AndroidTraceBridge : SpanProcessor {

  override fun isStartRequired() = true
  override fun isEndRequired() = true

  override fun onStart(parentContext: Context, span: ReadWriteSpan) {
    Trace.beginAsyncSection(span.name, span.spanContext.spanId.hashCode())
  }

  override fun onEnd(span: ReadableSpan) {
    Trace.endAsyncSection(span.name, span.spanContext.spanId.hashCode())
  }
}

All you really get with that is the span name with a start & end on the timeline, but that's still quite useful in looking at what is happening within the application. I wouldn't generally expect folks to be looking at Android traces from any sort of production app use, and typically they represent only a short period of time so we don't need to capture a broad set of attributes to make them useful. Plus the currently available APIs don't really make it feasible to do a whole lot more than a named trace section.

Here's an example of the output form that processor for a period of time that includes an OpenTelemetry span for a screenload alongside a few other Perfetto tracks capture from Android's tracing tools:

image

@bidetofevil
Copy link
Contributor

The data captured in Perfetto traces (or even just the ones spit out by ATrace) is generally too verbose to be consumed and recorded in production (e.g. ~1MB for about a second of data). They are meant for local debugging and tracing to get a good handle on the "shape" of execution on a real device. Even using the data with forthcoming OTel profiling signal might be problematic given the impracticality of parsing and coming this data in production.

@bherbst nailed in on the head - getting some OTel spans to show up inside a Perfetto trace might be interesting for local debugging, but I don't think it's practical to invert that relationship and get Perfetto data into OTel.

That being said, I think some of the data exposed in there might be interesting the selectively capture as OTel signals, but a more promising API for that is ProfilingManager. Unfortunately, it is only available for Android 15+ - even the pending AndroidX library is only going to work for for 35+ since the underlying hooks are not available in older OS versions.

If there are specific data from the API that looks interesting, it might be nice to add it, even if it's use in practice is quite limited until Android 15 is more widely adopted.

@cforce
Copy link
Author

cforce commented Nov 8, 2024

That being said, I think some of the data exposed in there might be interesting the selectively capture as OTel signals, but a more >promising API for that is ProfilingManager.

"Unfortunately, it is only available for >Android 15+"
What is only available for >Android 15+?

even the pending AndroidX library is only going to work for for 35+ since the underlying hooks are not available in older OS >versions.
Can you please share more details how the AndroidX lib is a dependency and why it is pending?

@bidetofevil
Copy link
Contributor

ProfilingManager. It only works wth Android 15 because only that OS version has all the hooks to get the data in a performant way at runtime. ATrace and the rest of the data sources for Perfetto traces are consolidated by the system tracing service process and directly written to disk and there's no programmatic way to access but reading the gigantic trace file.

And I misspoke about it being pending - it's available already now as part of android.core. But as you can see, @RequiresApi(api = 35), but they want it anyway because it's more Kotlin-y and they can change the APIs without tying it to the platform API.

@cforce
Copy link
Author

cforce commented Nov 9, 2024

ProfilingManager. It only works wth Android 15 because only that OS version has all the hooks to get the data in a performant >way at runtime. ATrace and the rest of the data sources for Perfetto traces are consolidated by the system tracing service process >Yand directly written to disk and there's no programmatic way to access but reading the gigantic trace file.

That’s why I initiated this discussion. Why do we have an until now an all-or-nothing and mostly bound to dedicated local tools approach in android , with data access being so restrictive? eBPF allows us to define trace points and “subscribe” to events with minimal overhead, offering a much more flexible method.

If there’s an API for better security and compliance, that’s great, but it should also provide a way to hook into kernel space events (for minimal performance impact) and export data in a format compatible with open standards like OpenTelemetry. From what I understand, the OpenTelemetry profiling signal is currently the best option available, as seen in ProfilingManager and ATrace).

By supporting an open, flexible framework like OpenTelemetry, Android profiling tools could align with widely adopted observability standards, making it easier to integrate performance data with other systems and maintain a low overhead, similar to eBPF’s approach.

Using OpenTelemetry profiling signals would enable developers to “subscribe” to data sources with minimal performance impact, streamlining profiling efforts. Additionally, it would allow kernel-level data to be exported in a standardized format, facilitating interoperability with tools and platforms that already support OpenTelemetry. This could lead to a more modular and accessible profiling approach, with less reliance on proprietary or closed systems, benefiting both developers and users.

@bidetofevil
Copy link
Contributor

A few things:

  1. This project can only consume data that the Android platform exposes, explicitly through APIs or implicitly through sleuthing (and we used to have to do that quite a lot). What I described were limitations imposed by the platform, which we are not in control of. If you want access to low level, kernel space events, it's folks at Google you have to talk to. And having talked directly to PMs and engineers in the past who worked on Android (as a representative of Twitter), lets just say it's not easy to get things on their roadmap that they weren't already thinking of doing.

  2. Android apps have restricted access to the data and goings on in other processes running on that device. The platform, for security reasons, correctly doesn't expose a lot of details to an app. So beyond what they already expose via Perfetto traces, it's unlikely they'll expose anymore.

  3. If the idea is to directly pipe a subset of data from Perfetto into OTel's profiling signal and have that run in production, I think that's intriguing. That said, I think there's a number of issues that have to be resolved before that can be explored: app permission to trigger the trace and access the file, and runtime performance to read and process the file, just to name a couple. But more importantly, how useful is having low level system events when we are only barely scratching the surface of higher level system events - in fact, we probably aren't even capture enough app level events.

While I think there are some interesting data that can be collected by Perfetto (e.g. binder calls, GCs, etc.), and there are also concrete applications of the profiling signal in OTel that would benefit the Android community (e.g. to better model events like ANR), before I would recommend we take this enhancement request forward, I'd personally need to see a more precise proposal with what specifically you think we should collect and how we can feasibly do so.

I'll leave it to the maintainers @breedx-splk and @LikeTheSalad to chime in.

@LikeTheSalad
Copy link
Contributor

I think profiling tools are great for solving a lot of issues, especially during development, however, they do create a lot of data that I don't think everybody would like to have stored in the cloud, especially considering the amount of data it would mean for mobile apps where it can quickly get really big based on the number of devices an app runs on.

If the intention of this issue is to provide the same data that can be seen when clicking on the Open Android example option that can be found here, but via OTel, then I don't think that's a good idea.

That's not to say it's impossible though, I think it can be done right now for whoever needs that kind of info exported via OTel by doing something similar to what @bherbst mentioned. However, I don't see it happening in OTel Android, at least not anytime soon as we still have a lot of work to do regarding covering the basics in terms of Android/RUM telemetry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants