Compaction issues - tempo 2.7 - outstanding blocks #4781
-
Hello, We are running Tempo in distributed mode with six compactors. Typically, on weekends, the compactors manage to reduce outstanding blocks to zero. However, over the past two weeks, we've noticed that the compactors are consuming significantly more CPU, and some blocks take much longer to compact compared to the usual 3-4 minutes.
We are operating in a single-tenant setup with the following configuration: `
` I’d appreciate your guidance on how to approach this issue. Scaling up by adding more compactors seems like an easy solution, but I’d like to understand why CPU usage has become consistently high instead of being limited to specific timeframes during compaction, as it was before. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
That's difficult to say. My initial guesses are that something about your write pattern changed. The two things we've seen that impact compactor resource usage the most are trace size and large, high cardinality attributes. For trace size you should consider setting max trace size limitations on your tenants. For high cardinality attributes you should consider setting up dedicated columns although I'm guessing you've already done that. Perhaps a write pattern changed and you need to re-up your dedicated columns? Also, Tempo 2.7 added the ability to truncate span attributes to prevent the consumption of enormous attributes and 2.8 will apply this setting to all scopes Also, getting outstanding blocks to 0 is not necessary for a happy/functioning Tempo, but I do understand your curiosity given the change in behavior. |
Beta Was this translation helpful? Give feedback.
-
Hi Joe, We have configured dedicated columns and set the maximum trace size to 50MB. I have reduced the max_span_attr_byte to 1024 hopefully keeping our spans more lightweight. Could you review our compactor configuration and suggest any adjustments to better optimize our single-tenant setup?
|
Beta Was this translation helpful? Give feedback.
-
50MB is quite big. It's definitely possible that someone started sending more, large traces then before which is causing your issue. I'd review metrics like
|
Beta Was this translation helpful? Give feedback.
-
Thank you Joe once again for useful input I'll add another compactor and see how it goes. |
Beta Was this translation helpful? Give feedback.
50MB is quite big. It's definitely possible that someone started sending more, large traces then before which is causing your issue. I'd review metrics like
tempo_distributor_spans_received_total
,tempo_distributor_bytes_received_total
,tempo_ingester_traces_created_total
andtempo_ingester_bytes_received_total
to see if a write pattern changed.v2_
settings don't matter. they apply to the old v2 backend only