-
-
Notifications
You must be signed in to change notification settings - Fork 135
update: add custom field p_format_verified #1303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update: add custom field p_format_verified #1303
Conversation
""" WalkthroughThis change refactors the event ingestion and schema extraction workflow to consistently use a mutable custom fields map ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTPHandler
participant EventProcessor
participant SchemaDefinition
Client->>HTTPHandler: Send event with headers
HTTPHandler->>HTTPHandler: Extract custom fields via get_custom_fields_from_header(&req)
HTTPHandler->>EventProcessor: extract_from_inline_log(json, &mut p_custom_fields, ...)
EventProcessor->>SchemaDefinition: check_or_extract(...)
alt Pattern matches
SchemaDefinition->>EventProcessor: Return extracted fields
EventProcessor->>p_custom_fields: Insert FORMAT_VERIFY_KEY = "true"
else Pattern does not match
SchemaDefinition->>EventProcessor: Extraction fails
EventProcessor->>p_custom_fields: Insert FORMAT_VERIFY_KEY = "false"
end
HTTPHandler->>HTTPHandler: flatten_and_push_logs(..., p_custom_fields)
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/event/format/known_schema.rs (1)
125-224
: Suggest updating unit tests for verification statusThe test cases should be updated to verify that the verification status is correctly added to both the JSON objects and the custom fields map.
You could add assertions to check for the presence and value of the
p_format_verified
field in your existing test cases, particularly in tests liketest_apache_log_extraction
,test_no_match
, andtest_extract_from_inline_log_object
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/event/format/known_schema.rs
(4 hunks)src/event/mod.rs
(1 hunks)src/handlers/http/ingest.rs
(7 hunks)src/handlers/http/modal/utils/ingest_utils.rs
(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/handlers/http/ingest.rs (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
get_custom_fields_from_header
(148-206)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (16)
src/event/mod.rs (1)
41-41
: Clean implementation of the verification key constantThe new constant
FORMAT_VERIFY_KEY
with value"p_format_verified"
is well-named and properly aligned with other event-related key constants in this file. This is a good pattern for centralizing the constant definition to avoid string duplication.src/handlers/http/modal/utils/ingest_utils.rs (2)
148-148
: Good API improvement: Using reference instead of ownershipModifying the function to accept a reference to
HttpRequest
rather than taking ownership is a better practice since the function doesn't need to consume the request.
220-220
: Consistently updated test casesAll test calls have been properly updated to pass references instead of moving ownership of the request, maintaining consistency with the function signature change.
Also applies to: 233-233, 246-246, 258-258
src/handlers/http/ingest.rs (7)
84-85
: Good approach to initialize custom fields earlyInitializing the mutable
p_custom_fields
at this point allows the verification status to be tracked throughout the ingestion process. The mutability is necessary as the format verification will update this map.
87-92
: Consistent parameter passing for schema verificationThe updated parameter passing to
extract_from_inline_log
correctly passes a mutable reference top_custom_fields
, enabling the verification status to be updated during schema extraction.
206-206
: Unified parameter passing in OTEL logs handlerThe
get_custom_fields_from_header
call now consistently passes a reference to the request, matching the updated function signature.
264-264
: Unified parameter passing in OTEL metrics handlerSame improvement as in the logs handler, maintaining consistency across all ingestion handlers.
323-323
: Unified parameter passing in OTEL traces handlerSame improvement as in the other handlers, completing the consistent pattern across all ingestion routes.
371-371
: Mutable custom fields in post_event functionThe custom fields map is properly initialized as mutable to support the format verification updates.
377-382
: Improved error handling approachInstead of returning errors when schema extraction fails, the code now marks the verification status in the custom fields map. This allows the ingestion pipeline to continue processing events even when format extraction fails.
src/event/format/known_schema.rs (6)
27-27
: Properly imported verification key constantUsing the constant from the event module maintains consistency and centralizes the definition.
125-130
: Schema verification success trackingWhen pattern matching succeeds, the code correctly adds the verification key with value "true" to mark successful format verification.
135-139
: Schema verification failure trackingWhen pattern matching fails, the code properly adds the verification key with value "false" to indicate format verification failure.
195-195
: Updated function signature to support verificationThe
extract_from_inline_log
method now accepts a mutable reference to the custom fields map, allowing it to update the map with verification status.
213-215
: Format verification for array eventsFor events in an array where schema extraction fails, the verification status is appropriately recorded in the custom fields map.
222-224
: Format verification for object eventsSimilar to array events, object events that fail schema extraction are also properly marked with the verification status.
if server detects known format, add `p_format_verified=true` to the event if schema detection fails, add `p_format_verified=false` to the event
9a83fd0
to
1675057
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
resources/formats.json (1)
113-149
: Secondalb_log
alternative pattern is valid; consider simplifying alternations
This new, more permissive pattern broadens support for missing or variable-length fields and constrainstrack_id
to theTID_[a-f0-9]+
format. The fields array matches the named groups exactly. One optional refactor: the alternation[^ ]+|[^ ]*
can be simplified to[^ ]*
without changing semantics, reducing regex verbosity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
resources/formats.json
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (1)
resources/formats.json (1)
75-112
: Firstalb_log
regex refinement looks correct
The updated pattern correctly captures client and target IP/port pairs, introduces non-greedyuser_agent
matching, and aligns perfectly with the fields list. The named groups and order are consistent, and the anchoring remains proper.
if server detects known format, add
p_format_verified=true
to the eventif schema detection fails, add
p_format_verified=false
to the eventSummary by CodeRabbit
New Features
Refactor
Bug Fixes