Skip to content

Commit

Permalink
]DOC] Tempo 2.7 release notes (#4537)
Browse files Browse the repository at this point in the history
(cherry picked from commit 8d2eb8e)
  • Loading branch information
knylander-grafana committed Jan 13, 2025
1 parent b0da6b4 commit d75e5d3
Show file tree
Hide file tree
Showing 4 changed files with 366 additions and 5 deletions.
234 changes: 234 additions & 0 deletions docs/sources/tempo/release-notes/v2-7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
---
title: Version 2.7 release notes
menuTitle: V2.7
description: Release notes for Grafana Tempo 2.7
weight: 25
---

# Version 2.7 release notes

The Tempo team is pleased to announce the release of Tempo 2.7.

This release gives you:

* Ability to precisely track ingested traffic and attribute costs based on custom labels
* A series of enhancements that significantly boost Tempo's performance and reduce its overall resource footprint.
* New TraceQL capabilities
* Improvements to TraceQL metrics

<!-- Add link to blog post
Read the [Tempo 2.7 blog post](https://grafana.com/blog/2024/09/05/grafana-tempo-2.6-release-performance-improvements-and-new-traceql-features/) for more examples and details about these improvements. -->

These release notes highlight the most important features and bugfixes.
For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases).

<!-- add link to video for blog post
{{< youtube id="aIDkPJ_e3W4" >}}
-->

## Features and enhancements

The most important features and enhancements in Tempo 2.7 are highlighted below.

### Track ingested traffic and attribute costs

This new feature lets tenants precisely measure and attribute costs for their ingested trace data by leveraging custom labels.
This functionality provides a more accurate alternative to existing size-based metrics and meets the growing need for detailed cost attribution and billing transparency.

Modern organizations are increasingly reliant on distributed traces for observability, yet reconciling the costs associated with different teams, services, or departments can be challenging.
The existing size metric isn't accurate enough by missing non-span data and can lead to under- or over-counting.
Tempo’s new usage tracking feature overcomes these issues by splitting resource-level data fairly and providing up to 99% accuracy—perfect for cost reconciliation.

Unlike the previous method, this new feature precisely accounts for every byte of trace data in the distributor—the only Tempo component with the original payload.
A new API endpoint, `/usage_metrics`, exposes the per-tenant metrics on ingested data and cost attribution, and can be controlled with per-tenant configuration.

This feature is designed as a foundation for broader usage tracking capabilities, where additional trackers will allow organizations to measure and report on a range of usage metrics. ([#4162](https://github.com/grafana/tempo/pull/4162))

For additional information, refer to the [Usage metrics documentation](https://grafana.com/docs/tempo/<TEMPO_VERSION>/api_docs/#usage-metrics).

### Major performance and memory usage improvements

We're excited to announce a series of enhancements that significantly boost Tempo's performance and reduce its overall resource footprint.

**Better refuse large traces:** The ingester now reuses generator code to better detect and reject oversized traces.
This change makes trace ingestion more reliable and prevents capacity overloads.
Plus, two new metrics `tempo_metrics_generator_live_trace_bytes` and `tempo_ingester_live_trace_bytes` provide deeper visibility into per-tenant byte usage. ([#4365](https://github.com/grafana/tempo/pull/4365))

**Reduced allocations:** We’ve refined how the query-frontend handles incoming traffic to eliminate unnecessary allocations when query demand is low.
As part of these improvements, the `querier_forget_delay` configuration option has been removed because it no longer served a practical purpose. ([#3996](https://github.com/grafana/tempo/pull/3996))
This release also reduces the ingester working set by improving prelloc behavior.
It also adds tunable prealloc env variables `PREALLOC_BKT_SIZE`, `PREALLOC_NUM_BUCKETS`, `PREALLOC_MIN_BUCKET`, and metric `tempo_ingester_prealloc_miss_bytes_total` to observe and tune prealloc behavior. ([#4344](https://github.com/grafana/tempo/pull/4344), [#4369](https://github.com/grafana/tempo/pull/4369))

**Faster tag lookups and collector operations:** Multiple optimizations ([#4100](https://github.com/grafana/tempo/pull/4100), [#4104](https://github.com/grafana/tempo/pull/4104), [#4109](https://github.com/grafana/tempo/pull/4109)) make tag lookups and collector tasks more responsive, particularly for distinct value searches. Additionally, enabling disk caching for completed blocks in ingesters significantly cuts down on query latency and lowers overall I/O overhead. ([#4069](https://github.com/grafana/tempo/pull/4069))

**Reduce goroutines in non-querier components:** A simplified design across non-querier components lowers the total number of goroutines, making Tempo more efficient and easier to scale—even with high trace volumes. ([#4484](https://github.com/grafana/tempo/pull/4484))

### New TraceQL capabilities

New in Tempo 2.7, TraceQL now allows you to query the [instrumentation scope](https://opentelemetry.io/docs/concepts/instrumentation-scope/) fields ([#3967](https://github.com/grafana/tempo/pull/3967)), letting you filter and explore your traces based on where and how they were instrumented.

We’ve extended TraceQL to automatically collect matches from array values ([#3867](https://github.com/grafana/tempo/pull/3867)), making it easier to parse spans containing arrays of attributes.

Query times are notably faster, thanks to multiple optimizations ([#4114](https://github.com/grafana/tempo/pull/4114), [#4163](https://github.com/grafana/tempo/pull/4163), [#4438](https://github.com/grafana/tempo/pull/4438)).
Whether you’re running standard queries or advanced filters, you should see a significant speed boost.

Tempo now uses the Prometheus “fast regex” engine to accelerate regular expression-based filtering ([#4329](https://github.com/grafana/tempo/pull/4329)).
As part of this update, all regex matches are now fully anchored.
This breaking change means `span.foo =~ "bar"` is evaluated as `span.foo =~ "^bar$"`.
Update any affected queries accordingly.

### Query improvements

In API v2 queries, Tempo can now return partial traces even when they exceed the max bytes limit ([#3941](https://github.com/grafana/tempo/pull/3941)).
This ensures you can still retrieve and inspect useful segments of large traces to aid in debugging.
Refer to the [query v2 API endpoint](https://grafana.com/docs/tempo/<TEMPO_VERSION>/api_docs/#query-v2) documentation for more information.

### TraceQL metrics improvements (experimental)

In TraceQL metrics, we’ve added a new `avg_over_time` function ([#4073](https://github.com/grafana/tempo/pull/4073)) to help you compute average values for trace-based metrics over specified time ranges, making it simpler to spot trends and anomalies.

Tempo now supports `min_over_time` ([#3975](https://github.com/grafana/tempo/pull/3975)) and `max_over_time` ([#4065](https://github.com/grafana/tempo/pull/4065)) queries, giving you more flexibility in analyzing the smallest or largest values across your trace data.

For more information about TraceQL metrics, refer to [TraceQL metrics functions](https://grafana.com/docs/tempo/<TEMPO_VERSION>/traceql/metrics-queries/functions/#traceql-metrics-functions).

### Other enhancements and improvements

This release also has these notable updates:

* The [metrics-generator](https://grafana.com/docs/tempo/latest/metrics-generator/) introduces a generous limit of 100 for failed flush attempts ([#4254](https://github.com/grafana/tempo/pull/4254)) to prevent constant retries on corrupted blocks. A new metric also tracks these flush failures, offering better visibility into potential issues during ingestion.
* For [span metrics](https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/), the span multiplier now also sources its value from the resource attributes ([#4210](https://github.com/grafana/tempo/pull/4210)). This makes it possible to adjust and correct metrics using service or environment configuration, ensuring more accurate data reporting.
* The tempo-cli now supports dropping multiple trace IDs in a single command, speeding up administrative tasks and simplifying cleanup operations. ([#4266](https://github.com/grafana/tempo/pull/4266))
* An optional log, `log_discarded_spans`, has been added to track spans discarded by Tempo. This improves visibility into data ingestion workflows and helps you quickly diagnose any dropped spans. ([#3957](https://github.com/grafana/tempo/issues/3957))
* You can now impose limits on tag and tag-value lookups, preventing runaway queries in large-scale deployments and ensuring a smoother user experience. ([#4320](https://github.com/grafana/tempo/pull/4320))
* The tags and tag-values endpoints have gained new throughput and SLO-related metrics. Get deeper insights into query performance and reliability straight from Tempo’s built-in monitoring. ([#4148](https://github.com/grafana/tempo/pull/4148))
* Tempo now exposes a SemVer version in its `/api/status/buildinfo` endpoint, providing clear visibility into versioning, particularly useful for cloud deployments and automated pipelines. ([#4110](https://github.com/grafana/tempo/pull/4110))

## Upgrade considerations

When [upgrading](https://grafana.com/docs/tempo/latest/setup/upgrade/) to Tempo 2.7, be aware of these considerations and breaking changes.

### OpenTelemetry Collector receiver listens on `localhost` by default

After this change, the OpenTelemetry Collector receiver defaults to binding on `localhost` rather than `0.0.0.0`. Tempo installations running in Docker or other container environments must update their listener address to continue receiving data. ([#4465](https://github.com/grafana/tempo/pull/4465))

Most Tempo installations use the receivers with the default configuration:

```yaml
distributor:
receivers:
otlp:
protocols:
grpc:
http:
```
This used to work fine since the receivers defaulted to `0.0.0.0:4317` and `0.0.0.0:4318` respectively. With the changes to replace unspecified addresses, the receivers now default to `localhost:4317` and `localhost:4318`.

As a result, connections to Tempo running in a Docker container won't work anymore.

To workaround this, you need to specify the address you want to bind to explicitly. For instance, if Tempo is running in a container with hostname `tempo`, this should work:

```yaml
# ...
http:
endpoint: "tempo:4318"
```

You can also explicitly bind to `0.0.0.0` still, but this has potential security risks:

```yaml
# ...
http:
endpoint: "0.0.0.0:4318"
```

### Tempo serverless deprecation

Tempo serverless is now officially deprecated and will be removed in an upcoming release.
Prepare to migrate any serverless workflows to alternative deployments. ([#4017](https://github.com/grafana/tempo/pull/4017), [documentation](https://grafana.com/docs/tempo/latest/operations/backend_search/#serverless-environment))

There are no changes to this release for serverless. However, you’ll need to remove these configurations before the next release.

### Anchored regular expressioon matchers in TraceQL

TraceQL now uses the Prometheus “fast regex” engine to accelerate regular expression-based filtering ([#4329](https://github.com/grafana/tempo/pull/4329)).
As part of this update, all regex matches are now fully anchored.
This breaking change means `span.foo =~ "bar"` is evaluated as `span.foo =~ "^bar$"`.
Update any affected queries accordingly.

For more information, refer to the [Comparison operators TraceQL](http://localhost:3002/docs/tempo/<TEMPO_VERSION>/traceql/#comparison-operators) documentation.

### Migration from OpenTracing to OpenTelemetry

The `use_otel_tracer` option is removed.
Configure your spans via standard OpenTelemetry environment variables.
For Jaeger exporting, set `OTEL_TRACES_EXPORTER=jaeger`.For more information, refer to the [OpenTelemetry documentation](https://www.google.com/url?q=https://opentelemetry.io/docs/languages/sdk-configuration/&sa=D&source=docs&ust=1736460391410238&usg=AOvVaw3bykVWwn34XfhrnFK73uM_). ([#3646](https://github.com/grafana/tempo/pull/3646))

### Added, updated, removed, or renamed configuration parameters

<table>
<tr>
<td><strong>Parameter</strong>
</td>
<td><strong>Comments</strong>
</td>
</tr>
<tr>
<td><code>querier_forget_delay</code>
</td>
<td>Removed. The <code>querier_forget_delay</code> setting provided no effective functionality and has been dropped. (<a href="https://github.com/grafana/tempo/pull/3996">#3996</a>)
</td>
</tr>
<tr>
<td><code>use_otel_tracer</code>
</td>
<td>Removed. Configure your spans via standard OpenTelemetry environment variables. For Jaeger exporting, set <code>OTEL_TRACES_EXPORTER=jaeger</code>. (<a href="https://github.com/grafana/tempo/pull/3646">#3646</a>)
</td>
</tr>
<tr>
<td>
<code>
max_spans_per_span_set
</code>
</td>
<td>Added to query-frontend configuration. The limit is enabled by default and set to 100. Set it to `0` to restore the old behavior (unlimited). Otherwise, spans beyond the configured max are dropped. (<a href="https://github.com/grafana/tempo/pull/4383">#4275</a>)
</td>
</tr>
<tr>
<td><code>use_otel_tracer</code>
</td>
<td>The <code>use_otel_tracer</code> option is removed. Configure your spans via standard OpenTelemetry environment variables. For Jaeger exporting, set <code>OTEL_TRACES_EXPORTER=jaeger</code>. (<a href="https://github.com/grafana/tempo/pull/3646">#3646</a>)
</td>
</tr>
</table>

### Other upgrade considerations

* The Tempo CLI now targets the `/api/v2/traces` endpoint by default. Use the `--v1` flag if you still rely on the older `/api/traces` endpoint. ([#4127](https://github.com/grafana/tempo/pull/4127))
* If you already set the `X-Scope-OrgID` header in per-tenant overrides or global Tempo config, it is now honored and not overwritten by Tempo. This may change behavior if you previously depended on automatic injection. ([#4021](https://github.com/grafana/tempo/pull/4021))
* The AWS Lambda build output changes from main to bootstrap. Follow [AWS’s migration steps](https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-from-the-go1-x-runtime-to-the-custom-runtime-on-amazon-linux-2/) to ensure your Lambda functions continue to work. ([#3852](https://github.com/grafana/tempo/pull/3852))
* Disable gRPC compression in the querier and distributor for performance reasons. ([#4429](https://github.com/grafana/tempo/pull/4429)) Check the gRPC compression settings if you see network issues. If you would like to re-enable it, we recommend 'snappy'. Use the following settings:
```
ingester_client:
grpc_client_config:
grpc_compression: "snappy"
metrics_generator_client:
grpc_client_config:
grpc_compression: "snappy"
querier:
frontend_worker:
grpc_client_config:
grpc_compression: "snappy"
```
## Bugfixes
For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases).
* Add `invalid_utf8` to reasons spanmetrics will discard spans. ([#4293](https://github.com/grafana/tempo/pull/4293)) We now catch values in your tracing data that can’t be used as valid metrics labels. If you want span metrics by foo, but foo has illegal prom changes in it, then they won’t be written.
* Metrics-generators: Correctly drop from the ring before stopping ingestion to reduce drops during a rollout. ([#4101](https://github.com/grafana/tempo/pull/4101))
* Correctly handle 400 Bad Request and 404 Not Found in gRPC streaming. ([#4144](https://github.com/grafana/tempo/pull/4144))
* Correctly handle Authorization header in gRPC streaming. ([#4419](https://github.com/grafana/tempo/pull/4419))
* Fix TraceQL metrics time range handling at the cutoff between recent and backend data. ([#4257](https://github.com/grafana/tempo/issues/4257))
* Fix several issues with exemplar values for TraceQL metrics. ([#4366](https://github.com/grafana/tempo/pull/4366), [#4404](https://github.com/grafana/tempo/pull/4404))
* Utilize S3Pass and S3User parameters in tempo-cli options, which were previously unused in the code. ([#4259](https://github.com/grafana/tempo/pull/4259))
Loading

0 comments on commit d75e5d3

Please sign in to comment.