How does tempo uses parquet on object storage now? #2128

scalalang2 · 2023-02-23T07:08:24Z

scalalang2
Feb 23, 2023

Hello,
I have a question about how Tempo works. One of the things that impressed to me is that it utilizes object storage instead of traditional backend database such as ElasticSearch to provide distributed tracing at a lower cost.

I'm really curious about how this works, especially, how it uses Parquet with object storage.

Q1. Does Tempo query trace data directly from the object storage?
I assume that It must download parquet file from S3 to local storage.
As far as I know, It seems to need to download whole file from parquet to read data.

Q2. If tempo uses local storage. Do I need to allocate large amount of storage in production?

Q3. Does Compactor remove old locally downloaded files periodically?

Answered by joe-elliott

Feb 23, 2023

Q1. Does Tempo query trace data directly from the object storage?

Tempo only pulls the columns it needs when executing a query from object storage. It does not download entire blocks locally when querying. It also has to pull the Parquet footer for each block so it can know where each column exists.

Q2. If tempo uses local storage. Do I need to allocate large amount of storage in production?

Only the ingesters require local storage. Upon receiving data they write it to a set of local parquet files that they then flush to the backend.

Q3. Does Compactor remove old locally downloaded files periodically?

Compactors take multiple input blocks from object storage and combine them togethe…

View full answer

scalalang2 · 2023-02-23T07:46:25Z

scalalang2
Feb 23, 2023
Author

Is there a document that describes whole Tempo architecture?
I wanna understand this project more deeply.

0 replies

joe-elliott · 2023-02-23T14:00:59Z

joe-elliott
Feb 23, 2023
Maintainer

Q1. Does Tempo query trace data directly from the object storage?

Tempo only pulls the columns it needs when executing a query from object storage. It does not download entire blocks locally when querying. It also has to pull the Parquet footer for each block so it can know where each column exists.

Q2. If tempo uses local storage. Do I need to allocate large amount of storage in production?

Only the ingesters require local storage. Upon receiving data they write it to a set of local parquet files that they then flush to the backend.

Q3. Does Compactor remove old locally downloaded files periodically?

Compactors take multiple input blocks from object storage and combine them together to create one output block. This serves 2 purposes:

deduplicate traces in an RF=3 environment
reduce the blocklist to reduce latency on trace by id queries
Compactors are also responsible for removing blocks from object storage when they fall out of retention. The behavior of compactors is currently quite tuned to the old format. The team has not had a chance to revisit the strategy here and there is likely improvements to make.

Is there a document that describes whole Tempo architecture?

These are mostly up to date. They won't answer all of your questions but they cover the high level: https://grafana.com/docs/tempo/latest/operations/

I wanna understand this project more deeply.

Awesome :)! Please keep asking questions. I'm happy to answer.

1 reply

scalalang2 Feb 25, 2023
Author

Thanks :) It helped me a lot, i'll keep moving to understand it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does tempo uses parquet on object storage now? #2128

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How does tempo uses parquet on object storage now? #2128

scalalang2 Feb 23, 2023

Replies: 2 comments · 1 reply

scalalang2 Feb 23, 2023 Author

joe-elliott Feb 23, 2023 Maintainer

scalalang2 Feb 25, 2023 Author

scalalang2
Feb 23, 2023

Replies: 2 comments 1 reply

scalalang2
Feb 23, 2023
Author

joe-elliott
Feb 23, 2023
Maintainer

scalalang2 Feb 25, 2023
Author