-
Notifications
You must be signed in to change notification settings - Fork 530
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
]DOC] Tempo 2.7 release notes (#4537)
(cherry picked from commit 8d2eb8e)
- Loading branch information
1 parent
b0da6b4
commit d75e5d3
Showing
4 changed files
with
366 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,234 @@ | ||
--- | ||
title: Version 2.7 release notes | ||
menuTitle: V2.7 | ||
description: Release notes for Grafana Tempo 2.7 | ||
weight: 25 | ||
--- | ||
|
||
# Version 2.7 release notes | ||
|
||
The Tempo team is pleased to announce the release of Tempo 2.7. | ||
|
||
This release gives you: | ||
|
||
* Ability to precisely track ingested traffic and attribute costs based on custom labels | ||
* A series of enhancements that significantly boost Tempo's performance and reduce its overall resource footprint. | ||
* New TraceQL capabilities | ||
* Improvements to TraceQL metrics | ||
|
||
<!-- Add link to blog post | ||
Read the [Tempo 2.7 blog post](https://grafana.com/blog/2024/09/05/grafana-tempo-2.6-release-performance-improvements-and-new-traceql-features/) for more examples and details about these improvements. --> | ||
|
||
These release notes highlight the most important features and bugfixes. | ||
For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). | ||
|
||
<!-- add link to video for blog post | ||
{{< youtube id="aIDkPJ_e3W4" >}} | ||
--> | ||
|
||
## Features and enhancements | ||
|
||
The most important features and enhancements in Tempo 2.7 are highlighted below. | ||
|
||
### Track ingested traffic and attribute costs | ||
|
||
This new feature lets tenants precisely measure and attribute costs for their ingested trace data by leveraging custom labels. | ||
This functionality provides a more accurate alternative to existing size-based metrics and meets the growing need for detailed cost attribution and billing transparency. | ||
|
||
Modern organizations are increasingly reliant on distributed traces for observability, yet reconciling the costs associated with different teams, services, or departments can be challenging. | ||
The existing size metric isn't accurate enough by missing non-span data and can lead to under- or over-counting. | ||
Tempo’s new usage tracking feature overcomes these issues by splitting resource-level data fairly and providing up to 99% accuracy—perfect for cost reconciliation. | ||
|
||
Unlike the previous method, this new feature precisely accounts for every byte of trace data in the distributor—the only Tempo component with the original payload. | ||
A new API endpoint, `/usage_metrics`, exposes the per-tenant metrics on ingested data and cost attribution, and can be controlled with per-tenant configuration. | ||
|
||
This feature is designed as a foundation for broader usage tracking capabilities, where additional trackers will allow organizations to measure and report on a range of usage metrics. ([#4162](https://github.com/grafana/tempo/pull/4162)) | ||
|
||
For additional information, refer to the [Usage metrics documentation](https://grafana.com/docs/tempo/<TEMPO_VERSION>/api_docs/#usage-metrics). | ||
|
||
### Major performance and memory usage improvements | ||
|
||
We're excited to announce a series of enhancements that significantly boost Tempo's performance and reduce its overall resource footprint. | ||
|
||
**Better refuse large traces:** The ingester now reuses generator code to better detect and reject oversized traces. | ||
This change makes trace ingestion more reliable and prevents capacity overloads. | ||
Plus, two new metrics `tempo_metrics_generator_live_trace_bytes` and `tempo_ingester_live_trace_bytes` provide deeper visibility into per-tenant byte usage. ([#4365](https://github.com/grafana/tempo/pull/4365)) | ||
|
||
**Reduced allocations:** We’ve refined how the query-frontend handles incoming traffic to eliminate unnecessary allocations when query demand is low. | ||
As part of these improvements, the `querier_forget_delay` configuration option has been removed because it no longer served a practical purpose. ([#3996](https://github.com/grafana/tempo/pull/3996)) | ||
This release also reduces the ingester working set by improving prelloc behavior. | ||
It also adds tunable prealloc env variables `PREALLOC_BKT_SIZE`, `PREALLOC_NUM_BUCKETS`, `PREALLOC_MIN_BUCKET`, and metric `tempo_ingester_prealloc_miss_bytes_total` to observe and tune prealloc behavior. ([#4344](https://github.com/grafana/tempo/pull/4344), [#4369](https://github.com/grafana/tempo/pull/4369)) | ||
|
||
**Faster tag lookups and collector operations:** Multiple optimizations ([#4100](https://github.com/grafana/tempo/pull/4100), [#4104](https://github.com/grafana/tempo/pull/4104), [#4109](https://github.com/grafana/tempo/pull/4109)) make tag lookups and collector tasks more responsive, particularly for distinct value searches. Additionally, enabling disk caching for completed blocks in ingesters significantly cuts down on query latency and lowers overall I/O overhead. ([#4069](https://github.com/grafana/tempo/pull/4069)) | ||
|
||
**Reduce goroutines in non-querier components:** A simplified design across non-querier components lowers the total number of goroutines, making Tempo more efficient and easier to scale—even with high trace volumes. ([#4484](https://github.com/grafana/tempo/pull/4484)) | ||
|
||
### New TraceQL capabilities | ||
|
||
New in Tempo 2.7, TraceQL now allows you to query the [instrumentation scope](https://opentelemetry.io/docs/concepts/instrumentation-scope/) fields ([#3967](https://github.com/grafana/tempo/pull/3967)), letting you filter and explore your traces based on where and how they were instrumented. | ||
|
||
We’ve extended TraceQL to automatically collect matches from array values ([#3867](https://github.com/grafana/tempo/pull/3867)), making it easier to parse spans containing arrays of attributes. | ||
|
||
Query times are notably faster, thanks to multiple optimizations ([#4114](https://github.com/grafana/tempo/pull/4114), [#4163](https://github.com/grafana/tempo/pull/4163), [#4438](https://github.com/grafana/tempo/pull/4438)). | ||
Whether you’re running standard queries or advanced filters, you should see a significant speed boost. | ||
|
||
Tempo now uses the Prometheus “fast regex” engine to accelerate regular expression-based filtering ([#4329](https://github.com/grafana/tempo/pull/4329)). | ||
As part of this update, all regex matches are now fully anchored. | ||
This breaking change means `span.foo =~ "bar"` is evaluated as `span.foo =~ "^bar$"`. | ||
Update any affected queries accordingly. | ||
|
||
### Query improvements | ||
|
||
In API v2 queries, Tempo can now return partial traces even when they exceed the max bytes limit ([#3941](https://github.com/grafana/tempo/pull/3941)). | ||
This ensures you can still retrieve and inspect useful segments of large traces to aid in debugging. | ||
Refer to the [query v2 API endpoint](https://grafana.com/docs/tempo/<TEMPO_VERSION>/api_docs/#query-v2) documentation for more information. | ||
|
||
### TraceQL metrics improvements (experimental) | ||
|
||
In TraceQL metrics, we’ve added a new `avg_over_time` function ([#4073](https://github.com/grafana/tempo/pull/4073)) to help you compute average values for trace-based metrics over specified time ranges, making it simpler to spot trends and anomalies. | ||
|
||
Tempo now supports `min_over_time` ([#3975](https://github.com/grafana/tempo/pull/3975)) and `max_over_time` ([#4065](https://github.com/grafana/tempo/pull/4065)) queries, giving you more flexibility in analyzing the smallest or largest values across your trace data. | ||
|
||
For more information about TraceQL metrics, refer to [TraceQL metrics functions](https://grafana.com/docs/tempo/<TEMPO_VERSION>/traceql/metrics-queries/functions/#traceql-metrics-functions). | ||
|
||
### Other enhancements and improvements | ||
|
||
This release also has these notable updates: | ||
|
||
* The [metrics-generator](https://grafana.com/docs/tempo/latest/metrics-generator/) introduces a generous limit of 100 for failed flush attempts ([#4254](https://github.com/grafana/tempo/pull/4254)) to prevent constant retries on corrupted blocks. A new metric also tracks these flush failures, offering better visibility into potential issues during ingestion. | ||
* For [span metrics](https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/), the span multiplier now also sources its value from the resource attributes ([#4210](https://github.com/grafana/tempo/pull/4210)). This makes it possible to adjust and correct metrics using service or environment configuration, ensuring more accurate data reporting. | ||
* The tempo-cli now supports dropping multiple trace IDs in a single command, speeding up administrative tasks and simplifying cleanup operations. ([#4266](https://github.com/grafana/tempo/pull/4266)) | ||
* An optional log, `log_discarded_spans`, has been added to track spans discarded by Tempo. This improves visibility into data ingestion workflows and helps you quickly diagnose any dropped spans. ([#3957](https://github.com/grafana/tempo/issues/3957)) | ||
* You can now impose limits on tag and tag-value lookups, preventing runaway queries in large-scale deployments and ensuring a smoother user experience. ([#4320](https://github.com/grafana/tempo/pull/4320)) | ||
* The tags and tag-values endpoints have gained new throughput and SLO-related metrics. Get deeper insights into query performance and reliability straight from Tempo’s built-in monitoring. ([#4148](https://github.com/grafana/tempo/pull/4148)) | ||
* Tempo now exposes a SemVer version in its `/api/status/buildinfo` endpoint, providing clear visibility into versioning, particularly useful for cloud deployments and automated pipelines. ([#4110](https://github.com/grafana/tempo/pull/4110)) | ||
|
||
## Upgrade considerations | ||
|
||
When [upgrading](https://grafana.com/docs/tempo/latest/setup/upgrade/) to Tempo 2.7, be aware of these considerations and breaking changes. | ||
|
||
### OpenTelemetry Collector receiver listens on `localhost` by default | ||
|
||
After this change, the OpenTelemetry Collector receiver defaults to binding on `localhost` rather than `0.0.0.0`. Tempo installations running in Docker or other container environments must update their listener address to continue receiving data. ([#4465](https://github.com/grafana/tempo/pull/4465)) | ||
|
||
Most Tempo installations use the receivers with the default configuration: | ||
|
||
```yaml | ||
distributor: | ||
receivers: | ||
otlp: | ||
protocols: | ||
grpc: | ||
http: | ||
``` | ||
This used to work fine since the receivers defaulted to `0.0.0.0:4317` and `0.0.0.0:4318` respectively. With the changes to replace unspecified addresses, the receivers now default to `localhost:4317` and `localhost:4318`. | ||
|
||
As a result, connections to Tempo running in a Docker container won't work anymore. | ||
|
||
To workaround this, you need to specify the address you want to bind to explicitly. For instance, if Tempo is running in a container with hostname `tempo`, this should work: | ||
|
||
```yaml | ||
# ... | ||
http: | ||
endpoint: "tempo:4318" | ||
``` | ||
|
||
You can also explicitly bind to `0.0.0.0` still, but this has potential security risks: | ||
|
||
```yaml | ||
# ... | ||
http: | ||
endpoint: "0.0.0.0:4318" | ||
``` | ||
|
||
### Tempo serverless deprecation | ||
|
||
Tempo serverless is now officially deprecated and will be removed in an upcoming release. | ||
Prepare to migrate any serverless workflows to alternative deployments. ([#4017](https://github.com/grafana/tempo/pull/4017), [documentation](https://grafana.com/docs/tempo/latest/operations/backend_search/#serverless-environment)) | ||
|
||
There are no changes to this release for serverless. However, you’ll need to remove these configurations before the next release. | ||
|
||
### Anchored regular expressioon matchers in TraceQL | ||
|
||
TraceQL now uses the Prometheus “fast regex” engine to accelerate regular expression-based filtering ([#4329](https://github.com/grafana/tempo/pull/4329)). | ||
As part of this update, all regex matches are now fully anchored. | ||
This breaking change means `span.foo =~ "bar"` is evaluated as `span.foo =~ "^bar$"`. | ||
Update any affected queries accordingly. | ||
|
||
For more information, refer to the [Comparison operators TraceQL](http://localhost:3002/docs/tempo/<TEMPO_VERSION>/traceql/#comparison-operators) documentation. | ||
|
||
### Migration from OpenTracing to OpenTelemetry | ||
|
||
The `use_otel_tracer` option is removed. | ||
Configure your spans via standard OpenTelemetry environment variables. | ||
For Jaeger exporting, set `OTEL_TRACES_EXPORTER=jaeger`.For more information, refer to the [OpenTelemetry documentation](https://www.google.com/url?q=https://opentelemetry.io/docs/languages/sdk-configuration/&sa=D&source=docs&ust=1736460391410238&usg=AOvVaw3bykVWwn34XfhrnFK73uM_). ([#3646](https://github.com/grafana/tempo/pull/3646)) | ||
|
||
### Added, updated, removed, or renamed configuration parameters | ||
|
||
<table> | ||
<tr> | ||
<td><strong>Parameter</strong> | ||
</td> | ||
<td><strong>Comments</strong> | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>querier_forget_delay</code> | ||
</td> | ||
<td>Removed. The <code>querier_forget_delay</code> setting provided no effective functionality and has been dropped. (<a href="https://github.com/grafana/tempo/pull/3996">#3996</a>) | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>use_otel_tracer</code> | ||
</td> | ||
<td>Removed. Configure your spans via standard OpenTelemetry environment variables. For Jaeger exporting, set <code>OTEL_TRACES_EXPORTER=jaeger</code>. (<a href="https://github.com/grafana/tempo/pull/3646">#3646</a>) | ||
</td> | ||
</tr> | ||
<tr> | ||
<td> | ||
<code> | ||
max_spans_per_span_set | ||
</code> | ||
</td> | ||
<td>Added to query-frontend configuration. The limit is enabled by default and set to 100. Set it to `0` to restore the old behavior (unlimited). Otherwise, spans beyond the configured max are dropped. (<a href="https://github.com/grafana/tempo/pull/4383">#4275</a>) | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>use_otel_tracer</code> | ||
</td> | ||
<td>The <code>use_otel_tracer</code> option is removed. Configure your spans via standard OpenTelemetry environment variables. For Jaeger exporting, set <code>OTEL_TRACES_EXPORTER=jaeger</code>. (<a href="https://github.com/grafana/tempo/pull/3646">#3646</a>) | ||
</td> | ||
</tr> | ||
</table> | ||
|
||
### Other upgrade considerations | ||
|
||
* The Tempo CLI now targets the `/api/v2/traces` endpoint by default. Use the `--v1` flag if you still rely on the older `/api/traces` endpoint. ([#4127](https://github.com/grafana/tempo/pull/4127)) | ||
* If you already set the `X-Scope-OrgID` header in per-tenant overrides or global Tempo config, it is now honored and not overwritten by Tempo. This may change behavior if you previously depended on automatic injection. ([#4021](https://github.com/grafana/tempo/pull/4021)) | ||
* The AWS Lambda build output changes from main to bootstrap. Follow [AWS’s migration steps](https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-from-the-go1-x-runtime-to-the-custom-runtime-on-amazon-linux-2/) to ensure your Lambda functions continue to work. ([#3852](https://github.com/grafana/tempo/pull/3852)) | ||
* Disable gRPC compression in the querier and distributor for performance reasons. ([#4429](https://github.com/grafana/tempo/pull/4429)) Check the gRPC compression settings if you see network issues. If you would like to re-enable it, we recommend 'snappy'. Use the following settings: | ||
``` | ||
ingester_client: | ||
grpc_client_config: | ||
grpc_compression: "snappy" | ||
metrics_generator_client: | ||
grpc_client_config: | ||
grpc_compression: "snappy" | ||
querier: | ||
frontend_worker: | ||
grpc_client_config: | ||
grpc_compression: "snappy" | ||
``` | ||
## Bugfixes | ||
For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). | ||
* Add `invalid_utf8` to reasons spanmetrics will discard spans. ([#4293](https://github.com/grafana/tempo/pull/4293)) We now catch values in your tracing data that can’t be used as valid metrics labels. If you want span metrics by foo, but foo has illegal prom changes in it, then they won’t be written. | ||
* Metrics-generators: Correctly drop from the ring before stopping ingestion to reduce drops during a rollout. ([#4101](https://github.com/grafana/tempo/pull/4101)) | ||
* Correctly handle 400 Bad Request and 404 Not Found in gRPC streaming. ([#4144](https://github.com/grafana/tempo/pull/4144)) | ||
* Correctly handle Authorization header in gRPC streaming. ([#4419](https://github.com/grafana/tempo/pull/4419)) | ||
* Fix TraceQL metrics time range handling at the cutoff between recent and backend data. ([#4257](https://github.com/grafana/tempo/issues/4257)) | ||
* Fix several issues with exemplar values for TraceQL metrics. ([#4366](https://github.com/grafana/tempo/pull/4366), [#4404](https://github.com/grafana/tempo/pull/4404)) | ||
* Utilize S3Pass and S3User parameters in tempo-cli options, which were previously unused in the code. ([#4259](https://github.com/grafana/tempo/pull/4259)) |
Oops, something went wrong.