Skip to content

Add OpenTelemetry instrumentation#1609

Open
parmesant wants to merge 2 commits intomainfrom
otel/instrument-1775057001447
Open

Add OpenTelemetry instrumentation#1609
parmesant wants to merge 2 commits intomainfrom
otel/instrument-1775057001447

Conversation

@parmesant
Copy link
Copy Markdown
Contributor

@parmesant parmesant commented Apr 1, 2026

Clean compilation with no warnings. Here's a summary of all changes made:

Files Modified

1. Cargo.toml

  • Added dependencies: tracing-opentelemetry = "0.32", tracing-actix-web = "0.7", opentelemetry, opentelemetry_sdk (with rt-tokio), opentelemetry-otlp (with grpc-tonic, http-proto, http-json) — all from the same git rev as existing opentelemetry-proto
  • Added tracing-subscriber feature "registry"
  • Added [patch.crates-io] section to unify opentelemetry and opentelemetry_sdk types across all crates

2. src/telemetry.rs (NEW)

  • init_otel_tracer() -> Option<SdkTracerProvider> — reads OTEL_EXPORTER_OTLP_ENDPOINT env var; if unset returns None (OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols via OTEL_EXPORTER_OTLP_PROTOCOL. Registers W3C TraceContextPropagator globally.

3. src/lib.rs

  • Added pub mod telemetry;

4. src/main.rs

  • init_logger() now returns Option<SdkTracerProvider> and wires the OTel tracing layer into the subscriber
  • main() captures the provider and calls provider.shutdown() before exit

5. src/handlers/http/modal/mod.rs

  • Replaced actix_web::middleware::Logger::default() with tracing_actix_web::TracingLogger::default() for automatic HTTP request tracing with W3C traceparent propagation

6. src/handlers/http/query.rs — 7 functions instrumented

  • query() — root span with query.sql and query.streaming fields
  • get_counts() — root span
  • handle_count_query() — child span with table field
  • handle_non_streaming_query() — child span
  • handle_streaming_query() — child span
  • into_query() — child span
  • get_records_and_fields() — child span
  • create_streams_for_distributed() — child span with stream_count field + Pattern 1 span propagation into JoinSet::spawn tasks

7. src/query/mod.rs — 4 functions instrumented

  • execute() — child span + Pattern 2 W3C TraceContext propagation across QUERY_RUNTIME (separate Runtime::new() — cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure.
  • Query::execute() — child span (query.datafusion_execute)
  • CountsRequest::get_bin_density() — child span with stream field
  • get_manifest_list() — child span with stream field

8. src/storage/field_stats.rs — 1 function instrumented

  • get_dataset_stats() — root span

9. src/handlers/http/cluster/mod.rs — 1 function instrumented

  • send_query_request() — child span

Co-authored-by: otex-dev dev@otex.dev

Summary by CodeRabbit

  • New Features
    • Integrated OpenTelemetry-based distributed tracing with selectable OTLP protocols (gRPC, HTTP/protobuf, HTTP/JSON) and opt-out via env var.
    • Switched request logging to a tracing-based logger for structured request traces.
    • Added rich instrumentation across query, streaming, and dataset endpoints, preserving trace context across spawned tasks.
    • Ensures tracer provider is properly shut down/flushed on server exit.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds OpenTelemetry tracing: new dependencies and patch overrides, a public telemetry module that initializes OTLP exporters and registers a tracer, replaces Actix logger with tracing middleware, instruments HTTP handlers and query execution with tracing spans, and propagates W3C TraceContext into spawned query runtime tasks.

Changes

Cohort / File(s) Summary
Cargo & crate export
Cargo.toml, src/lib.rs
Added tracing/opentelemetry crates, enabled tracing-subscriber registry feature, added [patch.crates-io] pinning for opentelemetry crates; exported new pub mod telemetry;.
Telemetry init & main
src/telemetry.rs, src/main.rs
New telemetry::init_otel_tracer() -> Option<SdkTracerProvider> builds OTLP exporter (grpc/http-proto/http-json), registers global tracer and W3C propagator; init_logger() now returns Option<SdkTracerProvider> and conditionally registers tracing_opentelemetry layer and shuts down provider on exit.
Actix middleware & handlers
src/handlers/http/modal/mod.rs, src/handlers/http/cluster/mod.rs, src/storage/field_stats.rs
Switched Actix request logger to tracing_actix_web::TracingLogger; added #[tracing::instrument(...)] to handlers (skip sensitive args, set otel.kind = "server" where applicable).
Query execution & instrumentation
src/handlers/http/query.rs, src/query/mod.rs
Added #[tracing::instrument(...)] spans with structured fields and otel.kind annotations; introduced QueryResult alias; propagate W3C TraceContext into QUERY_RUNTIME spawned tasks and instrument per-task spans, and instrument stream spawn tasks with .instrument(...).

Sequence Diagram(s)

sequenceDiagram
    participant App as Application\nStartup
    participant Main as main()
    participant Telemetry as telemetry::init_otel_tracer()
    participant OTEL as OpenTelemetry\nExporter/Provider
    participant Tracing as tracing-subscriber\nRegistry

    Main->>Main: call init_logger()
    Main->>Telemetry: init_otel_tracer()
    Telemetry->>Telemetry: check OTEL_EXPORTER_OTLP_ENDPOINT
    Telemetry->>OTEL: create SpanExporter (grpc/http-proto/http-json)
    OTEL-->>Telemetry: exporter
    Telemetry->>OTEL: build SdkTracerProvider + BatchSpanProcessor
    OTEL-->>Telemetry: SdkTracerProvider
    Telemetry->>OTEL: set_tracer_provider(global)
    Telemetry->>OTEL: set_text_map_propagator(TraceContext)
    Telemetry-->>Main: Some(provider) or None
    Main->>Tracing: register tracing_opentelemetry layer (if provider)
    Tracing-->>Main: registry ready
Loading
sequenceDiagram
    participant Client as Client
    participant Middleware as TracingLogger
    participant Handler as Query Handler\n(instrumented)
    participant Executor as Query Executor / QUERY_RUNTIME
    participant OTEL as OTEL Exporter

    Client->>Middleware: HTTP request
    Middleware->>Middleware: create request span
    Middleware->>Handler: invoke handler (with span)
    Handler->>Executor: call execute(query)
    Executor->>Executor: serialize span context into map
    Executor->>Executor: spawn thread in QUERY_RUNTIME with context
    Executor->>Executor: extract context in spawned task
    Executor->>Executor: set parent span and instrument future
    Executor->>OTEL: spans emitted to exporter
    Executor-->>Handler: query result
    Handler-->>Client: HTTP response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • nikhilsinhaparseable

Poem

🐰 I hop through spans both near and far,

weaving trace-threads like a tiny star.
Context rides with me into each stream,
exporters hum to capture every gleam.
Hooray — more traces for observability!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add OpenTelemetry instrumentation' accurately reflects the main objective of the PR, which is to integrate OpenTelemetry tracing throughout the codebase with instrumentation spans, middleware updates, and initialization logic.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all major file modifications, the key changes made, and implementation patterns. However, it does not include a completed checklist section from the template, which includes testing and documentation requirements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch otel/instrument-1775057001447

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/query/mod.rs (2)

81-98: Consider a more descriptive name for this type alias.

RB is cryptic. A name like QueryResultBatches or RecordBatchResult would be more self-documenting and convey that this represents either collected batches or a streaming result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/mod.rs` around lines 81 - 98, The type alias RB is cryptic; rename
it to a descriptive identifier (e.g., QueryResultBatches or RecordBatchResult)
and update all references accordingly. Locate the type alias declaration "pub
type RB = Either<...>" and change the alias name, then update any usages
(functions, structs, trait impls) that reference RB to the new name (ensure
imports and re-exports are adjusted if RB was pub). Run cargo check/tests to
confirm no remaining references to RB.

520-520: Consider consistent span naming with a common prefix.

Other spans in this file use a query. prefix (e.g., query.execute, query.datafusion_execute). For consistency and easier filtering in trace backends, consider naming these query.get_bin_density and query.get_manifest_list.

🔧 Suggested naming for consistency
-#[tracing::instrument(name = "get_bin_density", skip_all, fields(stream = %self.stream))]
+#[tracing::instrument(name = "query.get_bin_density", skip_all, fields(stream = %self.stream))]
 pub async fn get_bin_density(
-#[tracing::instrument(name = "get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))]
+#[tracing::instrument(name = "query.get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))]
 pub async fn get_manifest_list(

Also applies to: 726-726

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/mod.rs` at line 520, The tracing span names lack the common
"query." prefix; update the tracing::instrument attributes for the functions
like get_bin_density and get_manifest_list to use names such as
"query.get_bin_density" and "query.get_manifest_list" (i.e., change
#[tracing::instrument(name = "get_bin_density", ...)] to
#[tracing::instrument(name = "query.get_bin_density", ...)] and similarly for
get_manifest_list) so span names match the existing query.* convention for
consistent filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/handlers/http/query.rs`:
- Line 120: The tracing span currently emits the full SQL via the attribute on
the "query" handler (the tracing::instrument on the function using
query_request), which can leak sensitive data; remove the raw query text from
the span fields and instead record a non-sensitive identifier such as a
hash/fingerprint or metadata (e.g., query_hash, param_count, or a boolean for
streaming) derived from query_request.query; update the tracing attribute and
any span field population to use that safe field (compute the hash/fingerprint
in the handler using query_request.query and expose only that value in the span,
or omit the SQL entirely and keep query.streaming if needed).

In `@src/telemetry.rs`:
- Around line 34-35: The OTEL initialization currently only checks for
OTEL_EXPORTER_OTLP_ENDPOINT so a signal-specific env var like
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is ignored; update the gate in the OTEL init
path (where OTEL is enabled/disabled, e.g., in the function that reads env vars
to initialize OTEL) to proceed if either OTEL_EXPORTER_OTLP_ENDPOINT or
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, and ensure later logic still prefers
the signal-specific OTLP_TRACES endpoint when constructing the traces exporter
URL; change the boolean check that references only OTEL_EXPORTER_OTLP_ENDPOINT
to an existence check for either env var and preserve existing resolution logic
for trace endpoint selection.

---

Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The type alias RB is cryptic; rename it to a descriptive
identifier (e.g., QueryResultBatches or RecordBatchResult) and update all
references accordingly. Locate the type alias declaration "pub type RB =
Either<...>" and change the alias name, then update any usages (functions,
structs, trait impls) that reference RB to the new name (ensure imports and
re-exports are adjusted if RB was pub). Run cargo check/tests to confirm no
remaining references to RB.
- Line 520: The tracing span names lack the common "query." prefix; update the
tracing::instrument attributes for the functions like get_bin_density and
get_manifest_list to use names such as "query.get_bin_density" and
"query.get_manifest_list" (i.e., change #[tracing::instrument(name =
"get_bin_density", ...)] to #[tracing::instrument(name =
"query.get_bin_density", ...)] and similarly for get_manifest_list) so span
names match the existing query.* convention for consistent filtering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7d2eb9b3-0884-4b11-9ba6-9582033fe14f

📥 Commits

Reviewing files that changed from the base of the PR and between 187a4e6 and 5a2b128.

📒 Files selected for processing (9)
  • Cargo.toml
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/modal/mod.rs
  • src/handlers/http/query.rs
  • src/lib.rs
  • src/main.rs
  • src/query/mod.rs
  • src/storage/field_stats.rs
  • src/telemetry.rs

@parmesant parmesant force-pushed the otel/instrument-1775057001447 branch from ef37f5f to 7701143 Compare April 3, 2026 07:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/query/mod.rs (1)

81-98: Consider adding documentation for the RB type alias.

The RB type alias encapsulates a complex nested type for query results. While functional, adding a doc comment explaining when each variant (batch vs. streaming) is used would improve maintainability.

📝 Suggested documentation
+/// Query result type: batch (`Left`) or streaming (`Right`).
+///
+/// - `Left(Vec<RecordBatch>)`: Used when `streaming = false` — all results collected.
+/// - `Right(...)`: Used when `streaming = true` — results streamed as they arrive.
 pub type RB = Either<
     Vec<RecordBatch>,
     Pin<
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/mod.rs` around lines 81 - 98, The RB type alias lacks
documentation: add a Rust doc comment above the pub type RB declaration
explaining that RB is an Either containing either a Vec<RecordBatch> (used for
fully-materialized in-memory results) or a streaming adapter
(Pin<Box<RecordBatchStreamAdapter<select_all::SelectAll<...>>>> used for
incremental/streamed results), and describe when each variant is produced and
consumed (e.g., small queries vs. large/streamed queries) and any relevant
Send/Pin constraints; reference the RB alias and the nested types (RecordBatch,
RecordBatchStreamAdapter, select_all::SelectAll, RecordBatchStream) in the
comment for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The RB type alias lacks documentation: add a Rust doc
comment above the pub type RB declaration explaining that RB is an Either
containing either a Vec<RecordBatch> (used for fully-materialized in-memory
results) or a streaming adapter
(Pin<Box<RecordBatchStreamAdapter<select_all::SelectAll<...>>>> used for
incremental/streamed results), and describe when each variant is produced and
consumed (e.g., small queries vs. large/streamed queries) and any relevant
Send/Pin constraints; reference the RB alias and the nested types (RecordBatch,
RecordBatchStreamAdapter, select_all::SelectAll, RecordBatchStream) in the
comment for clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4e0961e6-d501-4552-8134-32054487afbd

📥 Commits

Reviewing files that changed from the base of the PR and between 5a2b128 and 7701143.

📒 Files selected for processing (9)
  • Cargo.toml
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/modal/mod.rs
  • src/handlers/http/query.rs
  • src/lib.rs
  • src/main.rs
  • src/query/mod.rs
  • src/storage/field_stats.rs
  • src/telemetry.rs
✅ Files skipped from review due to trivial changes (4)
  • src/lib.rs
  • src/handlers/http/modal/mod.rs
  • src/storage/field_stats.rs
  • src/handlers/http/cluster/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • Cargo.toml
  • src/main.rs

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 3, 2026
@parmesant parmesant force-pushed the otel/instrument-1775057001447 branch from 7701143 to 8e15a0d Compare April 3, 2026 08:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/query/mod.rs (1)

81-98: Consider a more descriptive name for the type alias.

RB is quite terse and doesn't convey the semantic meaning. A name like QueryResultBatches or RecordBatchOutput would improve readability, especially for developers unfamiliar with this module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/mod.rs` around lines 81 - 98, The type alias currently named RB is
too terse; rename the alias RB to a more descriptive identifier such as
QueryResultBatches or RecordBatchOutput in src/query/mod.rs and update all
usages accordingly (e.g., replace RB with the new name wherever referenced in
functions, structs, or trait signatures) so the alias clearly conveys it
represents either Vec<RecordBatch> or the boxed RecordBatchStreamAdapter; ensure
imports/exports and any pub visibility remain unchanged after renaming.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The type alias currently named RB is too terse; rename the
alias RB to a more descriptive identifier such as QueryResultBatches or
RecordBatchOutput in src/query/mod.rs and update all usages accordingly (e.g.,
replace RB with the new name wherever referenced in functions, structs, or trait
signatures) so the alias clearly conveys it represents either Vec<RecordBatch>
or the boxed RecordBatchStreamAdapter; ensure imports/exports and any pub
visibility remain unchanged after renaming.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3a7f94d3-d818-4f1e-9d4e-61a3be5ea962

📥 Commits

Reviewing files that changed from the base of the PR and between 7701143 and 8e15a0d.

📒 Files selected for processing (9)
  • Cargo.toml
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/modal/mod.rs
  • src/handlers/http/query.rs
  • src/lib.rs
  • src/main.rs
  • src/query/mod.rs
  • src/storage/field_stats.rs
  • src/telemetry.rs
✅ Files skipped from review due to trivial changes (4)
  • src/lib.rs
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/modal/mod.rs
  • src/telemetry.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • Cargo.toml
  • src/main.rs
  • src/handlers/http/query.rs

@parmesant parmesant force-pushed the otel/instrument-1775057001447 branch 2 times, most recently from 1933219 to 5d3b658 Compare April 3, 2026 08:27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/handlers/http/query.rs (1)

483-503: ⚠️ Potential issue | 🟠 Major

Propagate stream-load failures out of the JoinSet.

The spawned task returns (stream_name, result), but the join_next() loop only checks JoinError, so this helper still returns Ok(()) when a requested stream failed to load. Both callers use ?, so they are expecting a precise failure here rather than a later planning error. Match Ok((stream_name, Err(e))) in the join loop and return that failure to the caller.
Based on learnings: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/query.rs` around lines 483 - 503, The join loop currently
only checks JoinError and ignores worker task results, so any stream-load
failure returned as Ok((stream_name, Err(e))) is swallowed; update the
join_set.join_next() handling to match Ok((stream_name, Err(e))) and propagate
that Err to the caller (return Err(e) or wrap with context) instead of
continuing, while still logging JoinError when result is Err(join_error). Also
ensure the spawn closure still returns (stream_name, result) from
create_stream_and_schema_from_storage and consider applying the Parseable
existence check pattern (!PARSEABLE.streams.contains(&stream_name) &&
(PARSEABLE.options.mode != Mode::Query ||
!PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)) in the
relevant logstream handlers so stream existence is validated in both Query and
standalone modes.
♻️ Duplicate comments (1)
src/handlers/http/query.rs (1)

120-120: ⚠️ Potential issue | 🟠 Major

Do not emit raw SQL into request spans.

query.sql = %query_request.query exports full statements, including literals and identifiers, into tracing backends. Record only a fingerprint, length, or statement kind here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/query.rs` at line 120, The tracing span currently emits raw
SQL via the attribute fields(…, query.sql = %query_request.query, …); instead
replace that with a safe derived value (e.g. a fingerprint, hash, statement
kind, or length) and expose that instead of the full text. Compute a
deterministic fingerprint or short hash from query_request.query (or extract the
statement kind/length) prior to the span and change the tracing field to
something like query.fingerprint, query.hash, query.kind or query.length
(referencing the computed value) while removing query.sql = %query_request.query
from the tracing attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/handlers/http/query.rs`:
- Around line 483-503: The join loop currently only checks JoinError and ignores
worker task results, so any stream-load failure returned as Ok((stream_name,
Err(e))) is swallowed; update the join_set.join_next() handling to match
Ok((stream_name, Err(e))) and propagate that Err to the caller (return Err(e) or
wrap with context) instead of continuing, while still logging JoinError when
result is Err(join_error). Also ensure the spawn closure still returns
(stream_name, result) from create_stream_and_schema_from_storage and consider
applying the Parseable existence check pattern
(!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode !=
Mode::Query ||
!PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)) in the
relevant logstream handlers so stream existence is validated in both Query and
standalone modes.

---

Duplicate comments:
In `@src/handlers/http/query.rs`:
- Line 120: The tracing span currently emits raw SQL via the attribute fields(…,
query.sql = %query_request.query, …); instead replace that with a safe derived
value (e.g. a fingerprint, hash, statement kind, or length) and expose that
instead of the full text. Compute a deterministic fingerprint or short hash from
query_request.query (or extract the statement kind/length) prior to the span and
change the tracing field to something like query.fingerprint, query.hash,
query.kind or query.length (referencing the computed value) while removing
query.sql = %query_request.query from the tracing attribute.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1ec32b0a-0a06-4d26-b606-002329b8e92c

📥 Commits

Reviewing files that changed from the base of the PR and between 8e15a0d and 5d3b658.

📒 Files selected for processing (9)
  • Cargo.toml
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/modal/mod.rs
  • src/handlers/http/query.rs
  • src/lib.rs
  • src/main.rs
  • src/query/mod.rs
  • src/storage/field_stats.rs
  • src/telemetry.rs
✅ Files skipped from review due to trivial changes (3)
  • src/handlers/http/modal/mod.rs
  • Cargo.toml
  • src/handlers/http/cluster/mod.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/storage/field_stats.rs
  • src/main.rs
  • src/telemetry.rs

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 3, 2026
@parmesant parmesant force-pushed the otel/instrument-1775057001447 branch 2 times, most recently from e771e29 to 71edd50 Compare April 5, 2026 12:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/handlers/http/query.rs (1)

477-503: ⚠️ Potential issue | 🟠 Major

Propagate stream-load failures instead of warning and continuing.

Each task logs create_stream_and_schema_from_storage(...) failures, but the JoinSet loop ignores the returned result, so this function still returns Ok(()) when one or more loads fail. The caller then proceeds with query planning against streams that were never materialized.

Based on learnings, stream existence checks in query/prism flows should propagate create_stream_and_schema_from_storage(...) failures instead of continuing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/query.rs` around lines 477 - 503, The loop over JoinSet
currently discards each task's returned Result so stream creation failures are
ignored; change the join_set.join_next() handling to unwrap the task's output
(expect Ok((stream_name, result)) or map Err(join_error) into an early return),
and if a task returned Err from
PARSEABLE.create_stream_and_schema_from_storage(...) propagate that error out of
this function (return Err with context mentioning stream_name) instead of just
logging a warning; reference the spawned task closure that returns (stream_name,
result), the create_stream_and_schema_from_storage call, and the
join_set.join_next() loop when making the change.
♻️ Duplicate comments (2)
src/telemetry.rs (1)

33-55: ⚠️ Potential issue | 🟠 Major

Honor OTEL_EXPORTER_OTLP_TRACES_ENDPOINT in the startup gate.

The docstring advertises the trace-specific endpoint, but initialization still returns None unless OTEL_EXPORTER_OTLP_ENDPOINT is set. A deployment that only sets OTEL_EXPORTER_OTLP_TRACES_ENDPOINT will silently disable tracing.

Suggested fix
-    std::env::var(EXPORTER_OTLP_ENDPOINT).ok()?;
+    let has_endpoint = std::env::var_os(EXPORTER_OTLP_ENDPOINT).is_some();
+    let has_traces_endpoint =
+        std::env::var_os("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT").is_some();
+    if !has_endpoint && !has_traces_endpoint {
+        return None;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/telemetry.rs` around lines 33 - 55, The init_otel_tracer startup gate
currently only checks EXPORTER_OTLP_ENDPOINT and returns None if it's missing,
but the docstring says OTEL_EXPORTER_OTLP_TRACES_ENDPOINT should also enable
tracing; update init_otel_tracer so the presence check uses either
EXPORTER_OTLP_ENDPOINT or EXPORTER_OTLP_TRACES_ENDPOINT (i.e., treat
OTLP_TRACES_ENDPOINT as a valid enabling variable), returning None only when
both env vars are absent; keep the rest of the function unchanged since the SDK
will still read the envs to build the exporter.
src/handlers/http/query.rs (1)

120-120: ⚠️ Potential issue | 🟠 Major

Do not attach raw SQL to request spans.

This leaks literals/identifiers into logs and OTLP backends and creates a very high-cardinality attribute. Record a fingerprint or length instead.

Safer span fields
-#[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming, query.start_time = query_request.start_time, query.end_time = query_request.end_time))]
+#[tracing::instrument(
+    name = "query",
+    skip(req, query_request),
+    fields(
+        otel.kind = "server",
+        query.streaming = query_request.streaming,
+        query.length = query_request.query.len() as u64,
+        query.start_time = %query_request.start_time,
+        query.end_time = %query_request.end_time
+    )
+)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/query.rs` at line 120, The span currently attaches raw SQL
via the tracing::instrument attribute (name = "query", skip(req, query_request),
fields(query.sql = %query_request.query...)), which leaks high-cardinality data;
remove query_request.query from the fields and instead compute a safe identifier
(e.g., a stable fingerprint/hash or the query length) inside the query handler
before creating the span (e.g., compute fingerprint =
fingerprint_query(&query_request.query) or query_len =
query_request.query.len()), then add a non-sensitive field in the
instrument/span such as query.fingerprint = %fingerprint or query.length =
query_len; ensure the tracing::instrument attribute references the new safe
field names and that no raw SQL string remains in any span fields.
🧹 Nitpick comments (1)
src/query/mod.rs (1)

81-99: Keep QueryResult opaque at the query boundary.

This pub alias now exposes both actix_web::Either and the exact RecordBatchStreamAdapter<select_all<...>> pipeline from a core module. That makes non-HTTP consumers depend on Actix and turns future stream-plumbing changes into a signature break; a small local enum with Vec<RecordBatch> / SendableRecordBatchStream would keep the API stable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/mod.rs` around lines 81 - 99, The public QueryResult alias leaks
actix_web::Either and the concrete RecordBatchStreamAdapter/select_all pipeline;
replace it with an opaque public type (e.g., a small pub enum QueryResult with
variants like Batches(Vec<RecordBatch>) and Stream(SendableRecordBatchStream) or
a single boxed dyn RecordBatchStream type) so callers don't depend on actix_web
or the specific adapter; update uses of QueryResult, the constructor points that
currently produce Either or RecordBatchStreamAdapter/select_all to return the
new enum/boxed-dyn type, and keep the concrete adapter types private within this
module (reference QueryResult, Either, RecordBatchStreamAdapter, select_all, and
RecordBatchStream in your changes).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/telemetry.rs`:
- Around line 56-79: The match on protocol currently uses a catch-all `_` that
silently maps any invalid OTEL_EXPORTER_OTLP_PROTOCOL value to
Protocol::HttpJson; change the match to include an explicit "http/json" arm that
builds the HTTP-json exporter (using
SpanExporter::builder().with_http().with_protocol(Protocol::HttpJson).build()),
and add an explicit error arm (e.g., `other` pattern) that logs a warning
including the invalid `protocol` value and disables/returns no exporter (so OTEL
is effectively turned off) instead of defaulting silently; update any callers
that expect an exporter to handle the disabled/None case appropriately.

---

Outside diff comments:
In `@src/handlers/http/query.rs`:
- Around line 477-503: The loop over JoinSet currently discards each task's
returned Result so stream creation failures are ignored; change the
join_set.join_next() handling to unwrap the task's output (expect
Ok((stream_name, result)) or map Err(join_error) into an early return), and if a
task returned Err from PARSEABLE.create_stream_and_schema_from_storage(...)
propagate that error out of this function (return Err with context mentioning
stream_name) instead of just logging a warning; reference the spawned task
closure that returns (stream_name, result), the
create_stream_and_schema_from_storage call, and the join_set.join_next() loop
when making the change.

---

Duplicate comments:
In `@src/handlers/http/query.rs`:
- Line 120: The span currently attaches raw SQL via the tracing::instrument
attribute (name = "query", skip(req, query_request), fields(query.sql =
%query_request.query...)), which leaks high-cardinality data; remove
query_request.query from the fields and instead compute a safe identifier (e.g.,
a stable fingerprint/hash or the query length) inside the query handler before
creating the span (e.g., compute fingerprint =
fingerprint_query(&query_request.query) or query_len =
query_request.query.len()), then add a non-sensitive field in the
instrument/span such as query.fingerprint = %fingerprint or query.length =
query_len; ensure the tracing::instrument attribute references the new safe
field names and that no raw SQL string remains in any span fields.

In `@src/telemetry.rs`:
- Around line 33-55: The init_otel_tracer startup gate currently only checks
EXPORTER_OTLP_ENDPOINT and returns None if it's missing, but the docstring says
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT should also enable tracing; update
init_otel_tracer so the presence check uses either EXPORTER_OTLP_ENDPOINT or
EXPORTER_OTLP_TRACES_ENDPOINT (i.e., treat OTLP_TRACES_ENDPOINT as a valid
enabling variable), returning None only when both env vars are absent; keep the
rest of the function unchanged since the SDK will still read the envs to build
the exporter.

---

Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-99: The public QueryResult alias leaks actix_web::Either and
the concrete RecordBatchStreamAdapter/select_all pipeline; replace it with an
opaque public type (e.g., a small pub enum QueryResult with variants like
Batches(Vec<RecordBatch>) and Stream(SendableRecordBatchStream) or a single
boxed dyn RecordBatchStream type) so callers don't depend on actix_web or the
specific adapter; update uses of QueryResult, the constructor points that
currently produce Either or RecordBatchStreamAdapter/select_all to return the
new enum/boxed-dyn type, and keep the concrete adapter types private within this
module (reference QueryResult, Either, RecordBatchStreamAdapter, select_all, and
RecordBatchStream in your changes).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 762dcd34-6e9d-4527-8cb9-e5b553fa7734

📥 Commits

Reviewing files that changed from the base of the PR and between 5d3b658 and e771e29.

📒 Files selected for processing (3)
  • src/handlers/http/query.rs
  • src/query/mod.rs
  • src/telemetry.rs

@parmesant parmesant force-pushed the otel/instrument-1775057001447 branch from 71edd50 to 10f5e20 Compare April 5, 2026 15:01
parmesant and others added 2 commits April 5, 2026 08:01
Clean compilation with no warnings. Here's a summary of all changes made:

## Files Modified

### 1. `Cargo.toml`
- Added dependencies: `tracing-opentelemetry = "0.32"`, `tracing-actix-web = "0.7"`, `opentelemetry`, `opentelemetry_sdk` (with `rt-tokio`), `opentelemetry-otlp` (with `grpc-tonic`, `http-proto`, `http-json`) — all from the same git rev as existing `opentelemetry-proto`
- Added `tracing-subscriber` feature `"registry"`
- Added `[patch.crates-io]` section to unify `opentelemetry` and `opentelemetry_sdk` types across all crates

### 2. `src/telemetry.rs` (NEW)
- `init_otel_tracer() -> Option<SdkTracerProvider>` — reads `OTEL_EXPORTER_OTLP_ENDPOINT` env var; if unset returns `None` (OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols via `OTEL_EXPORTER_OTLP_PROTOCOL`. Registers W3C `TraceContextPropagator` globally.

### 3. `src/lib.rs`
- Added `pub mod telemetry;`

### 4. `src/main.rs`
- `init_logger()` now returns `Option<SdkTracerProvider>` and wires the OTel tracing layer into the subscriber
- `main()` captures the provider and calls `provider.shutdown()` before exit

### 5. `src/handlers/http/modal/mod.rs`
- Replaced `actix_web::middleware::Logger::default()` with `tracing_actix_web::TracingLogger::default()` for automatic HTTP request tracing with W3C traceparent propagation

### 6. `src/handlers/http/query.rs` — 7 functions instrumented
- **`query()`** — root span with `query.sql` and `query.streaming` fields
- **`get_counts()`** — root span
- **`handle_count_query()`** — child span with `table` field
- **`handle_non_streaming_query()`** — child span
- **`handle_streaming_query()`** — child span
- **`into_query()`** — child span
- **`get_records_and_fields()`** — child span
- **`create_streams_for_distributed()`** — child span with `stream_count` field + Pattern 1 span propagation into `JoinSet::spawn` tasks

### 7. `src/query/mod.rs` — 4 functions instrumented
- **`execute()`** — child span + **Pattern 2 W3C TraceContext propagation** across `QUERY_RUNTIME` (separate `Runtime::new()` — cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure.
- **`Query::execute()`** — child span (`query.datafusion_execute`)
- **`CountsRequest::get_bin_density()`** — child span with `stream` field
- **`get_manifest_list()`** — child span with `stream` field

### 8. `src/storage/field_stats.rs` — 1 function instrumented
- **`get_dataset_stats()`** — root span

### 9. `src/handlers/http/cluster/mod.rs` — 1 function instrumented
- **`send_query_request()`** — child span

Co-authored-by: otex-dev <dev@otex.dev>
@parmesant parmesant force-pushed the otel/instrument-1775057001447 branch from 10f5e20 to 02f9fdd Compare April 5, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant