Service analytics #3256

kolesnikovae · 2024-04-26T07:11:23Z

Currently, we expect users to know what they are looking for and provide quite limited abilities for exploration. Pyroscope should provide users with insight into their data: it should be possible to identify "interesting" services or service instances and code hotspots at a glance, without querying anything specific.

Global View

At the highest level, we should present an overview of the user environment (the piece we're aware of): global statistics on the services sending data to Pyroscope.

We should identify a subset of "interesting" (or important) services that the user may want to take a closer look into. It does not mean we should only collect statistics for those services exclusively, but we should inform the user about them first; like suggestions.

This ought to be user-specific, although we could probably address this by utilizing multi-tenancy; the target audience is software engineers. Some of the criteria (in no particular order):

Owned by the user, user favorites.
Data statistics (ingested/stored/queried), projections/predictions.
Pattern changes (a surge, or drop in traffic or samples).
New services (+version?).
Dynamics, like new code hotspots (see below), or dimension statistics changes (see below).
AI featured.

Service catalog (registry)

List of all the services, including details such as:

Number of active instances.
Profile types.
Language/runtime (+version).
SDK/agent (+version).
Data statistics (ingested/stored/queried).

A user should be able to get answers to questions like:

How much data is ingested, stored, and queried, both total and per dimension (service, language, profile type, SDK/agent).
How many instances are sending data, both total and per dimension.
Average rate per service instance, average profile size.

These are frequently asked questions, and we can greatly help users if we provide them with all the necessary information.

Dimension statistics

We should provide a detailed breakdown of dimensions (labels) for each service. For each of the service labels, we identify top-K values and for each of the selected label values, we collect statistics such as: share of samples, ingestion traffic, and data on disk. Label query matches could be tracked as well.

Later on, this information could also help us to add new features like:

Query costs estimation.
Adaptive profiling (generate optimisation suggestions based on the access patterns)

See: #2648, #3037, #3226

Code hot spots

We should provide a list of functions (or call sites) that the user might be interested in. In the simplest form, this is a top-K list of functions (both flat and cumulative). More sophisticated analysis is possible, because the statistics are supposed to be collected service-wide in the background.

Recent activity / query history

This is somewhat unrelated, but we also may want to keep track of the user actions globally (kind of audit log):

It would be very useful to have access to the own recent queries and recently visited pages. I personally lack this very much.
This could help owners to adopt profiling in their environments/companies/teams.

This part should be handled on the client side (in the Grafana app plugin).

Service relationships

It might be handy to indicate relations between services:

Using external services (hello Tempo!)
Explicit configuration: e.g., we could group services by user-specified labels
Employing ML/DL techniques: ingestion/query pattern match, dimension clusters, "Users also query ..." thing, etc

This part could be handled on the client side (in the Grafana app plugin).

kolesnikovae self-assigned this Apr 26, 2024

kolesnikovae added enhancement New feature or request backend Mostly go code ux labels Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service analytics #3256

Service analytics #3256

kolesnikovae commented Apr 26, 2024 •

edited

Loading

Service analytics #3256

Service analytics #3256

Comments

kolesnikovae commented Apr 26, 2024 • edited Loading

Global View

Service catalog (registry)

Dimension statistics

Code hot spots

Recent activity / query history

Service relationships

kolesnikovae commented Apr 26, 2024 •

edited

Loading