Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds OpenTelemetry tracing: new dependencies and patch overrides, a public Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application\nStartup
participant Main as main()
participant Telemetry as telemetry::init_otel_tracer()
participant OTEL as OpenTelemetry\nExporter/Provider
participant Tracing as tracing-subscriber\nRegistry
Main->>Main: call init_logger()
Main->>Telemetry: init_otel_tracer()
Telemetry->>Telemetry: check OTEL_EXPORTER_OTLP_ENDPOINT
Telemetry->>OTEL: create SpanExporter (grpc/http-proto/http-json)
OTEL-->>Telemetry: exporter
Telemetry->>OTEL: build SdkTracerProvider + BatchSpanProcessor
OTEL-->>Telemetry: SdkTracerProvider
Telemetry->>OTEL: set_tracer_provider(global)
Telemetry->>OTEL: set_text_map_propagator(TraceContext)
Telemetry-->>Main: Some(provider) or None
Main->>Tracing: register tracing_opentelemetry layer (if provider)
Tracing-->>Main: registry ready
sequenceDiagram
participant Client as Client
participant Middleware as TracingLogger
participant Handler as Query Handler\n(instrumented)
participant Executor as Query Executor / QUERY_RUNTIME
participant OTEL as OTEL Exporter
Client->>Middleware: HTTP request
Middleware->>Middleware: create request span
Middleware->>Handler: invoke handler (with span)
Handler->>Executor: call execute(query)
Executor->>Executor: serialize span context into map
Executor->>Executor: spawn thread in QUERY_RUNTIME with context
Executor->>Executor: extract context in spawned task
Executor->>Executor: set parent span and instrument future
Executor->>OTEL: spans emitted to exporter
Executor-->>Handler: query result
Handler-->>Client: HTTP response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/query/mod.rs (2)
81-98: Consider a more descriptive name for this type alias.
RBis cryptic. A name likeQueryResultBatchesorRecordBatchResultwould be more self-documenting and convey that this represents either collected batches or a streaming result.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 81 - 98, The type alias RB is cryptic; rename it to a descriptive identifier (e.g., QueryResultBatches or RecordBatchResult) and update all references accordingly. Locate the type alias declaration "pub type RB = Either<...>" and change the alias name, then update any usages (functions, structs, trait impls) that reference RB to the new name (ensure imports and re-exports are adjusted if RB was pub). Run cargo check/tests to confirm no remaining references to RB.
520-520: Consider consistent span naming with a common prefix.Other spans in this file use a
query.prefix (e.g.,query.execute,query.datafusion_execute). For consistency and easier filtering in trace backends, consider naming thesequery.get_bin_densityandquery.get_manifest_list.🔧 Suggested naming for consistency
-#[tracing::instrument(name = "get_bin_density", skip_all, fields(stream = %self.stream))] +#[tracing::instrument(name = "query.get_bin_density", skip_all, fields(stream = %self.stream))] pub async fn get_bin_density(-#[tracing::instrument(name = "get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))] +#[tracing::instrument(name = "query.get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))] pub async fn get_manifest_list(Also applies to: 726-726
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` at line 520, The tracing span names lack the common "query." prefix; update the tracing::instrument attributes for the functions like get_bin_density and get_manifest_list to use names such as "query.get_bin_density" and "query.get_manifest_list" (i.e., change #[tracing::instrument(name = "get_bin_density", ...)] to #[tracing::instrument(name = "query.get_bin_density", ...)] and similarly for get_manifest_list) so span names match the existing query.* convention for consistent filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers/http/query.rs`:
- Line 120: The tracing span currently emits the full SQL via the attribute on
the "query" handler (the tracing::instrument on the function using
query_request), which can leak sensitive data; remove the raw query text from
the span fields and instead record a non-sensitive identifier such as a
hash/fingerprint or metadata (e.g., query_hash, param_count, or a boolean for
streaming) derived from query_request.query; update the tracing attribute and
any span field population to use that safe field (compute the hash/fingerprint
in the handler using query_request.query and expose only that value in the span,
or omit the SQL entirely and keep query.streaming if needed).
In `@src/telemetry.rs`:
- Around line 34-35: The OTEL initialization currently only checks for
OTEL_EXPORTER_OTLP_ENDPOINT so a signal-specific env var like
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is ignored; update the gate in the OTEL init
path (where OTEL is enabled/disabled, e.g., in the function that reads env vars
to initialize OTEL) to proceed if either OTEL_EXPORTER_OTLP_ENDPOINT or
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, and ensure later logic still prefers
the signal-specific OTLP_TRACES endpoint when constructing the traces exporter
URL; change the boolean check that references only OTEL_EXPORTER_OTLP_ENDPOINT
to an existence check for either env var and preserve existing resolution logic
for trace endpoint selection.
---
Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The type alias RB is cryptic; rename it to a descriptive
identifier (e.g., QueryResultBatches or RecordBatchResult) and update all
references accordingly. Locate the type alias declaration "pub type RB =
Either<...>" and change the alias name, then update any usages (functions,
structs, trait impls) that reference RB to the new name (ensure imports and
re-exports are adjusted if RB was pub). Run cargo check/tests to confirm no
remaining references to RB.
- Line 520: The tracing span names lack the common "query." prefix; update the
tracing::instrument attributes for the functions like get_bin_density and
get_manifest_list to use names such as "query.get_bin_density" and
"query.get_manifest_list" (i.e., change #[tracing::instrument(name =
"get_bin_density", ...)] to #[tracing::instrument(name =
"query.get_bin_density", ...)] and similarly for get_manifest_list) so span
names match the existing query.* convention for consistent filtering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7d2eb9b3-0884-4b11-9ba6-9582033fe14f
📒 Files selected for processing (9)
Cargo.tomlsrc/handlers/http/cluster/mod.rssrc/handlers/http/modal/mod.rssrc/handlers/http/query.rssrc/lib.rssrc/main.rssrc/query/mod.rssrc/storage/field_stats.rssrc/telemetry.rs
ef37f5f to
7701143
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/query/mod.rs (1)
81-98: Consider adding documentation for theRBtype alias.The
RBtype alias encapsulates a complex nested type for query results. While functional, adding a doc comment explaining when each variant (batch vs. streaming) is used would improve maintainability.📝 Suggested documentation
+/// Query result type: batch (`Left`) or streaming (`Right`). +/// +/// - `Left(Vec<RecordBatch>)`: Used when `streaming = false` — all results collected. +/// - `Right(...)`: Used when `streaming = true` — results streamed as they arrive. pub type RB = Either< Vec<RecordBatch>, Pin<🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 81 - 98, The RB type alias lacks documentation: add a Rust doc comment above the pub type RB declaration explaining that RB is an Either containing either a Vec<RecordBatch> (used for fully-materialized in-memory results) or a streaming adapter (Pin<Box<RecordBatchStreamAdapter<select_all::SelectAll<...>>>> used for incremental/streamed results), and describe when each variant is produced and consumed (e.g., small queries vs. large/streamed queries) and any relevant Send/Pin constraints; reference the RB alias and the nested types (RecordBatch, RecordBatchStreamAdapter, select_all::SelectAll, RecordBatchStream) in the comment for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The RB type alias lacks documentation: add a Rust doc
comment above the pub type RB declaration explaining that RB is an Either
containing either a Vec<RecordBatch> (used for fully-materialized in-memory
results) or a streaming adapter
(Pin<Box<RecordBatchStreamAdapter<select_all::SelectAll<...>>>> used for
incremental/streamed results), and describe when each variant is produced and
consumed (e.g., small queries vs. large/streamed queries) and any relevant
Send/Pin constraints; reference the RB alias and the nested types (RecordBatch,
RecordBatchStreamAdapter, select_all::SelectAll, RecordBatchStream) in the
comment for clarity.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4e0961e6-d501-4552-8134-32054487afbd
📒 Files selected for processing (9)
Cargo.tomlsrc/handlers/http/cluster/mod.rssrc/handlers/http/modal/mod.rssrc/handlers/http/query.rssrc/lib.rssrc/main.rssrc/query/mod.rssrc/storage/field_stats.rssrc/telemetry.rs
✅ Files skipped from review due to trivial changes (4)
- src/lib.rs
- src/handlers/http/modal/mod.rs
- src/storage/field_stats.rs
- src/handlers/http/cluster/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- Cargo.toml
- src/main.rs
7701143 to
8e15a0d
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/query/mod.rs (1)
81-98: Consider a more descriptive name for the type alias.
RBis quite terse and doesn't convey the semantic meaning. A name likeQueryResultBatchesorRecordBatchOutputwould improve readability, especially for developers unfamiliar with this module.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 81 - 98, The type alias currently named RB is too terse; rename the alias RB to a more descriptive identifier such as QueryResultBatches or RecordBatchOutput in src/query/mod.rs and update all usages accordingly (e.g., replace RB with the new name wherever referenced in functions, structs, or trait signatures) so the alias clearly conveys it represents either Vec<RecordBatch> or the boxed RecordBatchStreamAdapter; ensure imports/exports and any pub visibility remain unchanged after renaming.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The type alias currently named RB is too terse; rename the
alias RB to a more descriptive identifier such as QueryResultBatches or
RecordBatchOutput in src/query/mod.rs and update all usages accordingly (e.g.,
replace RB with the new name wherever referenced in functions, structs, or trait
signatures) so the alias clearly conveys it represents either Vec<RecordBatch>
or the boxed RecordBatchStreamAdapter; ensure imports/exports and any pub
visibility remain unchanged after renaming.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3a7f94d3-d818-4f1e-9d4e-61a3be5ea962
📒 Files selected for processing (9)
Cargo.tomlsrc/handlers/http/cluster/mod.rssrc/handlers/http/modal/mod.rssrc/handlers/http/query.rssrc/lib.rssrc/main.rssrc/query/mod.rssrc/storage/field_stats.rssrc/telemetry.rs
✅ Files skipped from review due to trivial changes (4)
- src/lib.rs
- src/handlers/http/cluster/mod.rs
- src/handlers/http/modal/mod.rs
- src/telemetry.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- Cargo.toml
- src/main.rs
- src/handlers/http/query.rs
1933219 to
5d3b658
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/handlers/http/query.rs (1)
483-503:⚠️ Potential issue | 🟠 MajorPropagate stream-load failures out of the
JoinSet.The spawned task returns
(stream_name, result), but thejoin_next()loop only checksJoinError, so this helper still returnsOk(())when a requested stream failed to load. Both callers use?, so they are expecting a precise failure here rather than a later planning error. MatchOk((stream_name, Err(e)))in the join loop and return that failure to the caller.
Based on learnings: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)ensures proper error handling in both modes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/query.rs` around lines 483 - 503, The join loop currently only checks JoinError and ignores worker task results, so any stream-load failure returned as Ok((stream_name, Err(e))) is swallowed; update the join_set.join_next() handling to match Ok((stream_name, Err(e))) and propagate that Err to the caller (return Err(e) or wrap with context) instead of continuing, while still logging JoinError when result is Err(join_error). Also ensure the spawn closure still returns (stream_name, result) from create_stream_and_schema_from_storage and consider applying the Parseable existence check pattern (!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)) in the relevant logstream handlers so stream existence is validated in both Query and standalone modes.
♻️ Duplicate comments (1)
src/handlers/http/query.rs (1)
120-120:⚠️ Potential issue | 🟠 MajorDo not emit raw SQL into request spans.
query.sql = %query_request.queryexports full statements, including literals and identifiers, into tracing backends. Record only a fingerprint, length, or statement kind here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/query.rs` at line 120, The tracing span currently emits raw SQL via the attribute fields(…, query.sql = %query_request.query, …); instead replace that with a safe derived value (e.g. a fingerprint, hash, statement kind, or length) and expose that instead of the full text. Compute a deterministic fingerprint or short hash from query_request.query (or extract the statement kind/length) prior to the span and change the tracing field to something like query.fingerprint, query.hash, query.kind or query.length (referencing the computed value) while removing query.sql = %query_request.query from the tracing attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/handlers/http/query.rs`:
- Around line 483-503: The join loop currently only checks JoinError and ignores
worker task results, so any stream-load failure returned as Ok((stream_name,
Err(e))) is swallowed; update the join_set.join_next() handling to match
Ok((stream_name, Err(e))) and propagate that Err to the caller (return Err(e) or
wrap with context) instead of continuing, while still logging JoinError when
result is Err(join_error). Also ensure the spawn closure still returns
(stream_name, result) from create_stream_and_schema_from_storage and consider
applying the Parseable existence check pattern
(!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode !=
Mode::Query ||
!PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)) in the
relevant logstream handlers so stream existence is validated in both Query and
standalone modes.
---
Duplicate comments:
In `@src/handlers/http/query.rs`:
- Line 120: The tracing span currently emits raw SQL via the attribute fields(…,
query.sql = %query_request.query, …); instead replace that with a safe derived
value (e.g. a fingerprint, hash, statement kind, or length) and expose that
instead of the full text. Compute a deterministic fingerprint or short hash from
query_request.query (or extract the statement kind/length) prior to the span and
change the tracing field to something like query.fingerprint, query.hash,
query.kind or query.length (referencing the computed value) while removing
query.sql = %query_request.query from the tracing attribute.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1ec32b0a-0a06-4d26-b606-002329b8e92c
📒 Files selected for processing (9)
Cargo.tomlsrc/handlers/http/cluster/mod.rssrc/handlers/http/modal/mod.rssrc/handlers/http/query.rssrc/lib.rssrc/main.rssrc/query/mod.rssrc/storage/field_stats.rssrc/telemetry.rs
✅ Files skipped from review due to trivial changes (3)
- src/handlers/http/modal/mod.rs
- Cargo.toml
- src/handlers/http/cluster/mod.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/storage/field_stats.rs
- src/main.rs
- src/telemetry.rs
e771e29 to
71edd50
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/handlers/http/query.rs (1)
477-503:⚠️ Potential issue | 🟠 MajorPropagate stream-load failures instead of warning and continuing.
Each task logs
create_stream_and_schema_from_storage(...)failures, but theJoinSetloop ignores the returnedresult, so this function still returnsOk(())when one or more loads fail. The caller then proceeds with query planning against streams that were never materialized.Based on learnings, stream existence checks in query/prism flows should propagate
create_stream_and_schema_from_storage(...)failures instead of continuing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/query.rs` around lines 477 - 503, The loop over JoinSet currently discards each task's returned Result so stream creation failures are ignored; change the join_set.join_next() handling to unwrap the task's output (expect Ok((stream_name, result)) or map Err(join_error) into an early return), and if a task returned Err from PARSEABLE.create_stream_and_schema_from_storage(...) propagate that error out of this function (return Err with context mentioning stream_name) instead of just logging a warning; reference the spawned task closure that returns (stream_name, result), the create_stream_and_schema_from_storage call, and the join_set.join_next() loop when making the change.
♻️ Duplicate comments (2)
src/telemetry.rs (1)
33-55:⚠️ Potential issue | 🟠 MajorHonor
OTEL_EXPORTER_OTLP_TRACES_ENDPOINTin the startup gate.The docstring advertises the trace-specific endpoint, but initialization still returns
NoneunlessOTEL_EXPORTER_OTLP_ENDPOINTis set. A deployment that only setsOTEL_EXPORTER_OTLP_TRACES_ENDPOINTwill silently disable tracing.Suggested fix
- std::env::var(EXPORTER_OTLP_ENDPOINT).ok()?; + let has_endpoint = std::env::var_os(EXPORTER_OTLP_ENDPOINT).is_some(); + let has_traces_endpoint = + std::env::var_os("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT").is_some(); + if !has_endpoint && !has_traces_endpoint { + return None; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/telemetry.rs` around lines 33 - 55, The init_otel_tracer startup gate currently only checks EXPORTER_OTLP_ENDPOINT and returns None if it's missing, but the docstring says OTEL_EXPORTER_OTLP_TRACES_ENDPOINT should also enable tracing; update init_otel_tracer so the presence check uses either EXPORTER_OTLP_ENDPOINT or EXPORTER_OTLP_TRACES_ENDPOINT (i.e., treat OTLP_TRACES_ENDPOINT as a valid enabling variable), returning None only when both env vars are absent; keep the rest of the function unchanged since the SDK will still read the envs to build the exporter.src/handlers/http/query.rs (1)
120-120:⚠️ Potential issue | 🟠 MajorDo not attach raw SQL to request spans.
This leaks literals/identifiers into logs and OTLP backends and creates a very high-cardinality attribute. Record a fingerprint or length instead.
Safer span fields
-#[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming, query.start_time = query_request.start_time, query.end_time = query_request.end_time))] +#[tracing::instrument( + name = "query", + skip(req, query_request), + fields( + otel.kind = "server", + query.streaming = query_request.streaming, + query.length = query_request.query.len() as u64, + query.start_time = %query_request.start_time, + query.end_time = %query_request.end_time + ) +)]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/query.rs` at line 120, The span currently attaches raw SQL via the tracing::instrument attribute (name = "query", skip(req, query_request), fields(query.sql = %query_request.query...)), which leaks high-cardinality data; remove query_request.query from the fields and instead compute a safe identifier (e.g., a stable fingerprint/hash or the query length) inside the query handler before creating the span (e.g., compute fingerprint = fingerprint_query(&query_request.query) or query_len = query_request.query.len()), then add a non-sensitive field in the instrument/span such as query.fingerprint = %fingerprint or query.length = query_len; ensure the tracing::instrument attribute references the new safe field names and that no raw SQL string remains in any span fields.
🧹 Nitpick comments (1)
src/query/mod.rs (1)
81-99: KeepQueryResultopaque at the query boundary.This
pubalias now exposes bothactix_web::Eitherand the exactRecordBatchStreamAdapter<select_all<...>>pipeline from a core module. That makes non-HTTP consumers depend on Actix and turns future stream-plumbing changes into a signature break; a small local enum withVec<RecordBatch>/SendableRecordBatchStreamwould keep the API stable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 81 - 99, The public QueryResult alias leaks actix_web::Either and the concrete RecordBatchStreamAdapter/select_all pipeline; replace it with an opaque public type (e.g., a small pub enum QueryResult with variants like Batches(Vec<RecordBatch>) and Stream(SendableRecordBatchStream) or a single boxed dyn RecordBatchStream type) so callers don't depend on actix_web or the specific adapter; update uses of QueryResult, the constructor points that currently produce Either or RecordBatchStreamAdapter/select_all to return the new enum/boxed-dyn type, and keep the concrete adapter types private within this module (reference QueryResult, Either, RecordBatchStreamAdapter, select_all, and RecordBatchStream in your changes).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/telemetry.rs`:
- Around line 56-79: The match on protocol currently uses a catch-all `_` that
silently maps any invalid OTEL_EXPORTER_OTLP_PROTOCOL value to
Protocol::HttpJson; change the match to include an explicit "http/json" arm that
builds the HTTP-json exporter (using
SpanExporter::builder().with_http().with_protocol(Protocol::HttpJson).build()),
and add an explicit error arm (e.g., `other` pattern) that logs a warning
including the invalid `protocol` value and disables/returns no exporter (so OTEL
is effectively turned off) instead of defaulting silently; update any callers
that expect an exporter to handle the disabled/None case appropriately.
---
Outside diff comments:
In `@src/handlers/http/query.rs`:
- Around line 477-503: The loop over JoinSet currently discards each task's
returned Result so stream creation failures are ignored; change the
join_set.join_next() handling to unwrap the task's output (expect
Ok((stream_name, result)) or map Err(join_error) into an early return), and if a
task returned Err from PARSEABLE.create_stream_and_schema_from_storage(...)
propagate that error out of this function (return Err with context mentioning
stream_name) instead of just logging a warning; reference the spawned task
closure that returns (stream_name, result), the
create_stream_and_schema_from_storage call, and the join_set.join_next() loop
when making the change.
---
Duplicate comments:
In `@src/handlers/http/query.rs`:
- Line 120: The span currently attaches raw SQL via the tracing::instrument
attribute (name = "query", skip(req, query_request), fields(query.sql =
%query_request.query...)), which leaks high-cardinality data; remove
query_request.query from the fields and instead compute a safe identifier (e.g.,
a stable fingerprint/hash or the query length) inside the query handler before
creating the span (e.g., compute fingerprint =
fingerprint_query(&query_request.query) or query_len =
query_request.query.len()), then add a non-sensitive field in the
instrument/span such as query.fingerprint = %fingerprint or query.length =
query_len; ensure the tracing::instrument attribute references the new safe
field names and that no raw SQL string remains in any span fields.
In `@src/telemetry.rs`:
- Around line 33-55: The init_otel_tracer startup gate currently only checks
EXPORTER_OTLP_ENDPOINT and returns None if it's missing, but the docstring says
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT should also enable tracing; update
init_otel_tracer so the presence check uses either EXPORTER_OTLP_ENDPOINT or
EXPORTER_OTLP_TRACES_ENDPOINT (i.e., treat OTLP_TRACES_ENDPOINT as a valid
enabling variable), returning None only when both env vars are absent; keep the
rest of the function unchanged since the SDK will still read the envs to build
the exporter.
---
Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-99: The public QueryResult alias leaks actix_web::Either and
the concrete RecordBatchStreamAdapter/select_all pipeline; replace it with an
opaque public type (e.g., a small pub enum QueryResult with variants like
Batches(Vec<RecordBatch>) and Stream(SendableRecordBatchStream) or a single
boxed dyn RecordBatchStream type) so callers don't depend on actix_web or the
specific adapter; update uses of QueryResult, the constructor points that
currently produce Either or RecordBatchStreamAdapter/select_all to return the
new enum/boxed-dyn type, and keep the concrete adapter types private within this
module (reference QueryResult, Either, RecordBatchStreamAdapter, select_all, and
RecordBatchStream in your changes).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 762dcd34-6e9d-4527-8cb9-e5b553fa7734
📒 Files selected for processing (3)
src/handlers/http/query.rssrc/query/mod.rssrc/telemetry.rs
71edd50 to
10f5e20
Compare
Clean compilation with no warnings. Here's a summary of all changes made: ## Files Modified ### 1. `Cargo.toml` - Added dependencies: `tracing-opentelemetry = "0.32"`, `tracing-actix-web = "0.7"`, `opentelemetry`, `opentelemetry_sdk` (with `rt-tokio`), `opentelemetry-otlp` (with `grpc-tonic`, `http-proto`, `http-json`) — all from the same git rev as existing `opentelemetry-proto` - Added `tracing-subscriber` feature `"registry"` - Added `[patch.crates-io]` section to unify `opentelemetry` and `opentelemetry_sdk` types across all crates ### 2. `src/telemetry.rs` (NEW) - `init_otel_tracer() -> Option<SdkTracerProvider>` — reads `OTEL_EXPORTER_OTLP_ENDPOINT` env var; if unset returns `None` (OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols via `OTEL_EXPORTER_OTLP_PROTOCOL`. Registers W3C `TraceContextPropagator` globally. ### 3. `src/lib.rs` - Added `pub mod telemetry;` ### 4. `src/main.rs` - `init_logger()` now returns `Option<SdkTracerProvider>` and wires the OTel tracing layer into the subscriber - `main()` captures the provider and calls `provider.shutdown()` before exit ### 5. `src/handlers/http/modal/mod.rs` - Replaced `actix_web::middleware::Logger::default()` with `tracing_actix_web::TracingLogger::default()` for automatic HTTP request tracing with W3C traceparent propagation ### 6. `src/handlers/http/query.rs` — 7 functions instrumented - **`query()`** — root span with `query.sql` and `query.streaming` fields - **`get_counts()`** — root span - **`handle_count_query()`** — child span with `table` field - **`handle_non_streaming_query()`** — child span - **`handle_streaming_query()`** — child span - **`into_query()`** — child span - **`get_records_and_fields()`** — child span - **`create_streams_for_distributed()`** — child span with `stream_count` field + Pattern 1 span propagation into `JoinSet::spawn` tasks ### 7. `src/query/mod.rs` — 4 functions instrumented - **`execute()`** — child span + **Pattern 2 W3C TraceContext propagation** across `QUERY_RUNTIME` (separate `Runtime::new()` — cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure. - **`Query::execute()`** — child span (`query.datafusion_execute`) - **`CountsRequest::get_bin_density()`** — child span with `stream` field - **`get_manifest_list()`** — child span with `stream` field ### 8. `src/storage/field_stats.rs` — 1 function instrumented - **`get_dataset_stats()`** — root span ### 9. `src/handlers/http/cluster/mod.rs` — 1 function instrumented - **`send_query_request()`** — child span Co-authored-by: otex-dev <dev@otex.dev>
10f5e20 to
02f9fdd
Compare
Clean compilation with no warnings. Here's a summary of all changes made:
Files Modified
1.
Cargo.tomltracing-opentelemetry = "0.32",tracing-actix-web = "0.7",opentelemetry,opentelemetry_sdk(withrt-tokio),opentelemetry-otlp(withgrpc-tonic,http-proto,http-json) — all from the same git rev as existingopentelemetry-prototracing-subscriberfeature"registry"[patch.crates-io]section to unifyopentelemetryandopentelemetry_sdktypes across all crates2.
src/telemetry.rs(NEW)init_otel_tracer() -> Option<SdkTracerProvider>— readsOTEL_EXPORTER_OTLP_ENDPOINTenv var; if unset returnsNone(OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols viaOTEL_EXPORTER_OTLP_PROTOCOL. Registers W3CTraceContextPropagatorglobally.3.
src/lib.rspub mod telemetry;4.
src/main.rsinit_logger()now returnsOption<SdkTracerProvider>and wires the OTel tracing layer into the subscribermain()captures the provider and callsprovider.shutdown()before exit5.
src/handlers/http/modal/mod.rsactix_web::middleware::Logger::default()withtracing_actix_web::TracingLogger::default()for automatic HTTP request tracing with W3C traceparent propagation6.
src/handlers/http/query.rs— 7 functions instrumentedquery()— root span withquery.sqlandquery.streamingfieldsget_counts()— root spanhandle_count_query()— child span withtablefieldhandle_non_streaming_query()— child spanhandle_streaming_query()— child spaninto_query()— child spanget_records_and_fields()— child spancreate_streams_for_distributed()— child span withstream_countfield + Pattern 1 span propagation intoJoinSet::spawntasks7.
src/query/mod.rs— 4 functions instrumentedexecute()— child span + Pattern 2 W3C TraceContext propagation acrossQUERY_RUNTIME(separateRuntime::new()— cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure.Query::execute()— child span (query.datafusion_execute)CountsRequest::get_bin_density()— child span withstreamfieldget_manifest_list()— child span withstreamfield8.
src/storage/field_stats.rs— 1 function instrumentedget_dataset_stats()— root span9.
src/handlers/http/cluster/mod.rs— 1 function instrumentedsend_query_request()— child spanCo-authored-by: otex-dev dev@otex.dev
Summary by CodeRabbit