newrelic · jdvr · Oct 15, 2025 · Oct 15, 2025 · Oct 16, 2025
diff --git a/...t/docs/distributed-tracing/infinite-tracing-on-premise/bring-your-own-cache.mdx b/...t/docs/distributed-tracing/infinite-tracing-on-premise/bring-your-own-cache.mdx
@@ -0,0 +1,248 @@
+---
+title: Bring your own cache
+tags:
+  - Distributed tracing
+  - Infinite Tracing
+  - On-premise
+  - Redis
+  - Cache configuration
+metaDescription: 'Configure Redis-compatible caches for Infinite Tracing on-premise tail sampling processor to enable high-availability and distributed processing'
+redirects: []
+freshnessValidatedDate: never
+---
+
+
+New Relic's Infinite Tracing Processor is an implementation of the OpenTelemetry Collector [tailsamplingprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor). In addition to upstream features, it supports highly scalable distributed processing by using a distributed cache for shared state storage. This documentation describes the supported cache implementations and their configuration.
+
+# Supported caches
+
+The processor supports any Redis-compatible cache implementation. It has been tested and validated with Redis and Valkey in both single-instance and cluster configurations.
+
+For production deployments, we recommend using cluster mode (sharded) to ensure high availability and scalability. To enable distributed caching, add the `distributed_cache` configuration to your `tail_sampling` processor section:
+
+```yaml
+  tail_sampling:
+    decision_wait: 30s
+    distributed_cache:
+      connection:
+        address: redis://localhost:6379/0
+        password: 'local'
+      trace_window_expiration: 30s
+      suffix: "itc"
+      max_traces_per_batch: 50
+```
+
+<Callout variant="important">
+  **Configuration behavior**: When `distributed_cache` is configured, the processor automatically uses the distributed cache for state management. If `distributed_cache` is omitted entirely, the collector will use in-memory processing instead. There is no separate `enabled` flag.
+</Callout>
+
+The `address` parameter must specify a valid Redis-compatible server address using the standard format:
+
+```shell
+redis[s]://[[username][:password]@][host][:port][/db-number]
+```
+
+Alternatively, you can embed credentials directly in the `address` parameter:
+
+```yaml
+  tail_sampling:
+    distributed_cache:
+      connection:
+        address: redis://:yourpassword@localhost:6379/0
+```
+
+The processor is implemented in Go and uses the [go-redis](https://github.com/redis/go-redis/tree/v9) client library.
+
+# Redis-compatible cache requirements
+
+The processor uses the cache as distributed storage for the following trace data:
+
+- Trace and span attributes
+- Active trace data
+- Sampling decision cache
+
+The processor executes **Lua scripts** to interact with the Redis cache atomically. Lua script support is typically enabled by default in Redis-compatible caches. No additional configuration is required unless you have explicitly disabled this feature.
+
+## Sizing and performance
+
+Proper Redis instance sizing is critical for optimal performance. The following example demonstrates how to calculate memory requirements based on a sample `tail_sampling` configuration:
+
+```yaml
+  tail_sampling:
+    decision_wait: 30s
+    distributed_cache:
+      connection:
+        address: redis://localhost:6379/0
+        password: 'local'
+      trace_window_expiration: 30s
+      suffix: "itc"
+      max_traces_per_batch: 50
+```
+
+To complete the calculation, you must also estimate your workload characteristics:
+- **Spans per second**: Assumed throughput of 10,000 spans/sec
+- **Average span size**: Assumed size of 900 bytes (marshaled protobuf format)
+
+### Memory estimation formula
+
+```
+Total Memory = (Trace Data) + (Decision Caches) + (Overhead)
+```
+
+#### 1. Trace data storage
+
+Trace data is stored temporarily in Redis during the trace window period:
+
+- **Per-span storage**: ~900 bytes (marshaled protobuf)
+- **Storage duration**: Controlled by `traces_ttl` (default: 240s)
+- **Active window**: Controlled by `trace_window_expiration` (default: 30s)
+- **Formula**: `Memory ≈ spans_per_second × trace_window_expiration × 900 bytes`
+
+**Example calculation**: At 10,000 spans/second with a 30-second `trace_window_expiration`:
+```
+10,000 spans/sec × 30 sec × 900 bytes = 270 MB
+```
+
+Note: This calculation estimates memory for actively accumulating traces. The actual Redis memory may be higher due to traces waiting in the evaluation queue or being processed.
+
+#### 2. Decision cache storage
+
+When using `distributed_cache`, the decision caches are stored in Redis without explicit size limits. Instead, Redis uses its native LRU eviction policy (configured via `maxmemory-policy`) to manage memory. Each trace ID requires approximately 50 bytes of storage:
+
+- **Sampled cache**: Managed by Redis LRU eviction
+- **Non-sampled cache**: Managed by Redis LRU eviction
+- **Typical overhead per trace ID**: ~50 bytes
+
+<Callout variant="tip">
+  **Memory management**: Configure Redis with `maxmemory-policy allkeys-lru` to allow automatic eviction of old decision cache entries when memory limits are reached. The decision cache keys use TTL-based expiration (controlled by `cache_ttl`) rather than fixed size limits.
+</Callout>
+
+
+
+#### 3. Batch processing overhead
+
+- **Current batch queue**: Minimal (trace IDs + scores in sorted set)
+- **In-flight batches**: `max_traces_per_batch × average_spans_per_trace × 900 bytes`
+
+**Example calculation**: 50 traces per batch with 20 spans per trace on average:
+```
+50 × 20 × 900 bytes = 900 KB per batch
+```
+
+Batch size also impacts memory usage and processing efficiency.
+
+### Complete sizing example
+
+Based on the configuration above with the following workload parameters:
+- **Throughput**: 10,000 spans/second
+- **Average span size**: 900 bytes
+
+| Component | Memory Required |
+|-----------|----------------|
+| Trace data (active) | 270 MB |
+| Decision caches | Variable (LRU-managed) |
+| Batch processing | ~1 MB |
+| Redis overhead (20%) | ~54 MB |
+| **Total (minimum)** | **~325 MB + decision cache** |
+
+<Callout variant="important">
+  **Sizing guidance**: The calculations above serve as an estimation example. We recommend performing your own capacity planning based on your specific workload characteristics. For production deployments, consider:
+  - Provisioning **2-3x the calculated memory** to accommodate traffic spikes and growth
+  - Using Redis cluster mode for horizontal scaling
+  - Monitoring actual memory usage and adjusting capacity accordingly
+</Callout>
+
+### Performance considerations
+
+- **Network latency**: Round-trip time between the collector and Redis directly impacts sampling throughput. Deploy Redis instances with low-latency network connectivity to the collector.
+- **Lua script execution**: All cache operations use atomic Lua scripts executed server-side, ensuring data consistency and optimal performance.
+- **Cluster mode**: Distributing load across multiple Redis nodes increases throughput and provides fault tolerance for high-availability deployments.
+
+# Limitations and evictions
+
+<Callout variant="caution">
+  **Performance bottleneck**: Redis and network communication are typically the limiting factors for processor performance. The speed and reliability of your Redis cache are essential for proper collector operation. Ensure your Redis instance has sufficient resources and maintains low-latency network connectivity to the collector.
+</Callout>
+
+The processor stores trace data temporarily in Redis while making sampling decisions. Understanding data management and eviction policies is critical for optimal performance.
+
+## Data stored in Redis
+
+The processor stores the following data structures in Redis:
+
+1. **Trace spans**: Stored as lists using protobuf-marshaled trace data
+2. **Decision cache**: Separate LRU caches for sampled and non-sampled trace IDs
+3. **Current batch queue**: Sorted set tracking traces waiting for sampling decisions
+4. **In-flight batches**: Temporary storage for traces being evaluated
+
+## TTL and expiration
+
+When using `distributed_cache`, the TTL configuration differs from the in-memory processor. The following parameters control data expiration:
+
+<Callout variant="important">
+  **Key difference from in-memory mode**: When `distributed_cache` is configured, `trace_window_expiration` replaces `decision_wait` for determining when traces are evaluated. The `trace_window_expiration` parameter defines a sliding window: each time new spans arrive for a trace, the trace remains active for another `trace_window_expiration` period. This incremental approach keeps traces with ongoing activity alive longer than those that have stopped receiving spans.
+</Callout>
+
+### TTL hierarchy and defaults
+
+The processor uses a cascading TTL structure, with each level providing protection for the layer below:
+
+1. **`trace_window_expiration`** (default: 30s)
+   - Configures how long to wait after the last span arrives before evaluating a trace
+   - Acts as a sliding window: resets each time new spans arrive for a trace
+   - Defined via `distributed_cache.trace_window_expiration`
+
+2. **`in_flight_timeout`** (default: `trace_window_expiration * 4` = 120s)
+   - Maximum time a batch can be processed before being considered orphaned
+   - Orphaned batches are automatically recovered and re-queued
+   - Defined via `distributed_cache.in_flight_timeout`
+
+3. **`traces_ttl`** (default: `in_flight_timeout * 2` = 240s)
+   - Redis key expiration for trace span data
+   - Ensures trace data persists long enough for evaluation and recovery
+   - Defined via `distributed_cache.traces_ttl`
+
+4. **`cache_ttl`** (default: `traces_ttl * 2` = 480s)
+   - Redis key expiration for decision cache entries (sampled/non-sampled)
+   - Prevents duplicate evaluation for late-arriving spans
+   - Defined via `distributed_cache.cache_ttl`
+
+### Example configuration
+
+```yaml
+  tail_sampling:
+    distributed_cache:
+      trace_window_expiration: 30s      # Primary control
+      in_flight_timeout: 120s           # Optional: defaults to trace_window_expiration * 4
+      traces_ttl: 240s                  # Optional: defaults to in_flight_timeout * 2
+      cache_ttl: 480s                   # Optional: defaults to traces_ttl * 2
+```
+
+## LRU eviction for decision caches
+
+When using `distributed_cache`, the decision caches rely on Redis's native LRU eviction rather than application-managed size limits:
+
+<Callout variant="important">
+  **Redis LRU configuration required**: Configure your Redis instance with `maxmemory-policy allkeys-lru` to enable automatic eviction of old entries when memory limits are reached. The decision cache keys are stored in Redis with TTL-based expiration (controlled by `cache_ttl`), and Redis will automatically evict the least recently used keys when memory pressure occurs.
+</Callout>
+
+- **Sampled cache**: TTL-managed (default: 480s via `cache_ttl`)
+- **Non-sampled cache**: TTL-managed (default: 480s via `cache_ttl`)
+
+This approach provides several benefits:
+- Recent sampling decisions remain available for late-arriving spans
+- No hard limit on cache size—Redis manages memory automatically
+- Consistent cache performance under load
+- Simpler configuration without manual cache sizing
+
+The decision caches use Lua scripts to atomically check for key existence and refresh TTLs, ensuring data consistency across distributed processor instances.
+
+## Batch processing
+
+The processor handles traces in batches to optimize performance:
+
+- **Maximum traces per batch**: Default of 50, configurable via `max_traces_per_batch`
+- **Atomic batch operations**: Batches are retrieved atomically from the current queue
+- **Failure recovery**: Failed batches are automatically recovered and re-queued after the in-flight timeout expires
+
+
diff --git a/...stributed-tracing/infinite-tracing-on-premise/infinite-tracing-introduction.mdx b/...stributed-tracing/infinite-tracing-on-premise/infinite-tracing-introduction.mdx
@@ -0,0 +1,12 @@
+---
+title: (fix this titlle) Introduction to Infinite Tracing
+tags:
+  - Understand dependencies
+  - Distributed tracing
+  - Infinite Tracing
+metaDescription: 'FIXME'
+redirects: 
+  - /docs/on-premise-infinite-tracing
+freshnessValidatedDate: never
+---
+
diff --git a/src/nav/distributed-tracing.yml b/src/nav/distributed-tracing.yml
@@ -35,6 +35,12 @@ pages:
         path: /docs/distributed-tracing/infinite-tracing/infinite-tracing-configure-proxy-support
       - title: 'Configure SSL for Java 7, 8'
         path: /docs/distributed-tracing/other-requirements/infinite-tracing-configuring-ssl-java-7-8
+  - title: Infinite Tracing Collector
+    pages:
+      - title: Introduction to Infinite Tracing Collector
+        path: /docs/distributed-tracing/infinite-tracing-on-premise/infinite-tracing-introduction
+      - title: Bring your own distributed cache
+        path: /docs/distributed-tracing/infinite-tracing-on-premise/bring-your-own-cache
   - title: Trace API
     pages:
       - title: Introduction to the Trace API