Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
---
title: Bring your own cache
tags:
- Distributed tracing
- Infinite Tracing
- On-premise
- Redis
- Cache configuration
metaDescription: 'Configure Redis-compatible caches for Infinite Tracing on-premise tail sampling processor to enable high-availability and distributed processing'
redirects: []
freshnessValidatedDate: never
---


New Relic's Infinite Tracing Processor is an implementation of the OpenTelemetry Collector [tailsamplingprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor). In addition to upstream features, it supports highly scalable distributed processing by using a distributed cache for shared state storage. This documentation describes the supported cache implementations and their configuration.

# Supported caches

The processor supports any Redis-compatible cache implementation. It has been tested and validated with Redis and Valkey in both single-instance and cluster configurations.

For production deployments, we recommend using cluster mode (sharded) to ensure high availability and scalability. To enable distributed caching, add the `distributed_cache` configuration to your `tail_sampling` processor section:

```yaml
tail_sampling:
decision_wait: 30s
distributed_cache:
connection:
address: redis://localhost:6379/0
password: 'local'
trace_window_expiration: 30s
suffix: "itc"
max_traces_per_batch: 50
```

<Callout variant="important">
**Configuration behavior**: When `distributed_cache` is configured, the processor automatically uses the distributed cache for state management. If `distributed_cache` is omitted entirely, the collector will use in-memory processing instead. There is no separate `enabled` flag.
</Callout>

The `address` parameter must specify a valid Redis-compatible server address using the standard format:

```shell
redis[s]://[[username][:password]@][host][:port][/db-number]
```

Alternatively, you can embed credentials directly in the `address` parameter:

```yaml
tail_sampling:
distributed_cache:
connection:
address: redis://:yourpassword@localhost:6379/0
```

The processor is implemented in Go and uses the [go-redis](https://github.com/redis/go-redis/tree/v9) client library.

# Redis-compatible cache requirements

The processor uses the cache as distributed storage for the following trace data:

- Trace and span attributes
- Active trace data
- Sampling decision cache

The processor executes **Lua scripts** to interact with the Redis cache atomically. Lua script support is typically enabled by default in Redis-compatible caches. No additional configuration is required unless you have explicitly disabled this feature.

## Sizing and performance

Proper Redis instance sizing is critical for optimal performance. The following example demonstrates how to calculate memory requirements based on a sample `tail_sampling` configuration:

```yaml
tail_sampling:
decision_wait: 30s
distributed_cache:
connection:
address: redis://localhost:6379/0
password: 'local'
trace_window_expiration: 30s
suffix: "itc"
max_traces_per_batch: 50
```

To complete the calculation, you must also estimate your workload characteristics:
- **Spans per second**: Assumed throughput of 10,000 spans/sec
- **Average span size**: Assumed size of 900 bytes (marshaled protobuf format)

### Memory estimation formula

```
Total Memory = (Trace Data) + (Decision Caches) + (Overhead)
```

#### 1. Trace data storage

Trace data is stored temporarily in Redis during the trace window period:

- **Per-span storage**: ~900 bytes (marshaled protobuf)
- **Storage duration**: Controlled by `traces_ttl` (default: 240s)
- **Active window**: Controlled by `trace_window_expiration` (default: 30s)
- **Formula**: `Memory ≈ spans_per_second × trace_window_expiration × 900 bytes`

**Example calculation**: At 10,000 spans/second with a 30-second `trace_window_expiration`:
```
10,000 spans/sec × 30 sec × 900 bytes = 270 MB
```

Note: This calculation estimates memory for actively accumulating traces. The actual Redis memory may be higher due to traces waiting in the evaluation queue or being processed.

#### 2. Decision cache storage

When using `distributed_cache`, the decision caches are stored in Redis without explicit size limits. Instead, Redis uses its native LRU eviction policy (configured via `maxmemory-policy`) to manage memory. Each trace ID requires approximately 50 bytes of storage:

- **Sampled cache**: Managed by Redis LRU eviction
- **Non-sampled cache**: Managed by Redis LRU eviction
- **Typical overhead per trace ID**: ~50 bytes

<Callout variant="tip">
**Memory management**: Configure Redis with `maxmemory-policy allkeys-lru` to allow automatic eviction of old decision cache entries when memory limits are reached. The decision cache keys use TTL-based expiration (controlled by `cache_ttl`) rather than fixed size limits.
</Callout>



#### 3. Batch processing overhead

- **Current batch queue**: Minimal (trace IDs + scores in sorted set)
- **In-flight batches**: `max_traces_per_batch × average_spans_per_trace × 900 bytes`

**Example calculation**: 50 traces per batch with 20 spans per trace on average:
```
50 × 20 × 900 bytes = 900 KB per batch
```

Batch size also impacts memory usage and processing efficiency.

### Complete sizing example

Based on the configuration above with the following workload parameters:
- **Throughput**: 10,000 spans/second
- **Average span size**: 900 bytes

| Component | Memory Required |
|-----------|----------------|
| Trace data (active) | 270 MB |
| Decision caches | Variable (LRU-managed) |
| Batch processing | ~1 MB |
| Redis overhead (20%) | ~54 MB |
| **Total (minimum)** | **~325 MB + decision cache** |

<Callout variant="important">
**Sizing guidance**: The calculations above serve as an estimation example. We recommend performing your own capacity planning based on your specific workload characteristics. For production deployments, consider:
- Provisioning **2-3x the calculated memory** to accommodate traffic spikes and growth
- Using Redis cluster mode for horizontal scaling
- Monitoring actual memory usage and adjusting capacity accordingly
</Callout>

### Performance considerations

- **Network latency**: Round-trip time between the collector and Redis directly impacts sampling throughput. Deploy Redis instances with low-latency network connectivity to the collector.
- **Lua script execution**: All cache operations use atomic Lua scripts executed server-side, ensuring data consistency and optimal performance.
- **Cluster mode**: Distributing load across multiple Redis nodes increases throughput and provides fault tolerance for high-availability deployments.

# Limitations and evictions

<Callout variant="caution">
**Performance bottleneck**: Redis and network communication are typically the limiting factors for processor performance. The speed and reliability of your Redis cache are essential for proper collector operation. Ensure your Redis instance has sufficient resources and maintains low-latency network connectivity to the collector.
</Callout>

The processor stores trace data temporarily in Redis while making sampling decisions. Understanding data management and eviction policies is critical for optimal performance.

## Data stored in Redis

The processor stores the following data structures in Redis:

1. **Trace spans**: Stored as lists using protobuf-marshaled trace data
2. **Decision cache**: Separate LRU caches for sampled and non-sampled trace IDs
3. **Current batch queue**: Sorted set tracking traces waiting for sampling decisions
4. **In-flight batches**: Temporary storage for traces being evaluated

## TTL and expiration

When using `distributed_cache`, the TTL configuration differs from the in-memory processor. The following parameters control data expiration:

<Callout variant="important">
**Key difference from in-memory mode**: When `distributed_cache` is configured, `trace_window_expiration` replaces `decision_wait` for determining when traces are evaluated. The `trace_window_expiration` parameter defines a sliding window: each time new spans arrive for a trace, the trace remains active for another `trace_window_expiration` period. This incremental approach keeps traces with ongoing activity alive longer than those that have stopped receiving spans.
</Callout>

### TTL hierarchy and defaults

The processor uses a cascading TTL structure, with each level providing protection for the layer below:

1. **`trace_window_expiration`** (default: 30s)
- Configures how long to wait after the last span arrives before evaluating a trace
- Acts as a sliding window: resets each time new spans arrive for a trace
- Defined via `distributed_cache.trace_window_expiration`

2. **`in_flight_timeout`** (default: `trace_window_expiration * 4` = 120s)
- Maximum time a batch can be processed before being considered orphaned
- Orphaned batches are automatically recovered and re-queued
- Defined via `distributed_cache.in_flight_timeout`

3. **`traces_ttl`** (default: `in_flight_timeout * 2` = 240s)
- Redis key expiration for trace span data
- Ensures trace data persists long enough for evaluation and recovery
- Defined via `distributed_cache.traces_ttl`

4. **`cache_ttl`** (default: `traces_ttl * 2` = 480s)
- Redis key expiration for decision cache entries (sampled/non-sampled)
- Prevents duplicate evaluation for late-arriving spans
- Defined via `distributed_cache.cache_ttl`

### Example configuration

```yaml
tail_sampling:
distributed_cache:
trace_window_expiration: 30s # Primary control
in_flight_timeout: 120s # Optional: defaults to trace_window_expiration * 4
traces_ttl: 240s # Optional: defaults to in_flight_timeout * 2
cache_ttl: 480s # Optional: defaults to traces_ttl * 2
```

## LRU eviction for decision caches

When using `distributed_cache`, the decision caches rely on Redis's native LRU eviction rather than application-managed size limits:

<Callout variant="important">
**Redis LRU configuration required**: Configure your Redis instance with `maxmemory-policy allkeys-lru` to enable automatic eviction of old entries when memory limits are reached. The decision cache keys are stored in Redis with TTL-based expiration (controlled by `cache_ttl`), and Redis will automatically evict the least recently used keys when memory pressure occurs.
</Callout>

- **Sampled cache**: TTL-managed (default: 480s via `cache_ttl`)
- **Non-sampled cache**: TTL-managed (default: 480s via `cache_ttl`)

This approach provides several benefits:
- Recent sampling decisions remain available for late-arriving spans
- No hard limit on cache size—Redis manages memory automatically
- Consistent cache performance under load
- Simpler configuration without manual cache sizing

The decision caches use Lua scripts to atomically check for key existence and refresh TTLs, ensuring data consistency across distributed processor instances.

## Batch processing

The processor handles traces in batches to optimize performance:

- **Maximum traces per batch**: Default of 50, configurable via `max_traces_per_batch`
- **Atomic batch operations**: Batches are retrieved atomically from the current queue
- **Failure recovery**: Failed batches are automatically recovered and re-queued after the in-flight timeout expires


Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: (fix this titlle) Introduction to Infinite Tracing
tags:
- Understand dependencies
- Distributed tracing
- Infinite Tracing
metaDescription: 'FIXME'
redirects:
- /docs/on-premise-infinite-tracing
freshnessValidatedDate: never
---

6 changes: 6 additions & 0 deletions src/nav/distributed-tracing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ pages:
path: /docs/distributed-tracing/infinite-tracing/infinite-tracing-configure-proxy-support
- title: 'Configure SSL for Java 7, 8'
path: /docs/distributed-tracing/other-requirements/infinite-tracing-configuring-ssl-java-7-8
- title: Infinite Tracing Collector
pages:
- title: Introduction to Infinite Tracing Collector
path: /docs/distributed-tracing/infinite-tracing-on-premise/infinite-tracing-introduction
- title: Bring your own distributed cache
path: /docs/distributed-tracing/infinite-tracing-on-premise/bring-your-own-cache
- title: Trace API
pages:
- title: Introduction to the Trace API
Expand Down
Loading