update: add custom field p_format_verified by nikhilsinhaparseable · Pull Request #1303 · parseablehq/parseable

nikhilsinhaparseable · 2025-04-25T18:34:14Z

if server detects known format, add p_format_verified=true to the event
if schema detection fails, add p_format_verified=false to the event

Summary by CodeRabbit

New Features
- Added a verification status field to custom fields, indicating whether schema extraction was successful for each event.
Refactor
- Streamlined the handling and propagation of custom fields during event ingestion for improved consistency.
- Updated function signatures to use references for better performance and clarity.
Bug Fixes
- Improved pattern matching in log format parsing to enhance accuracy of extracted fields.
- Enhanced flexibility of log format patterns to support additional log variants.

coderabbitai · 2025-04-25T18:34:20Z

"""

Walkthrough

This change refactors the event ingestion and schema extraction workflow to consistently use a mutable custom fields map (p_custom_fields) for tracking schema verification status. Instead of returning errors when schema extraction fails, the code now inserts a verification key (FORMAT_VERIFY_KEY) with a value of "true" or "false" into the custom fields. The function signatures for extracting custom fields from HTTP headers and inline logs are updated to use references, ensuring custom fields are propagated and updated throughout the ingestion pipeline. A new constant for the verification key is introduced.

Changes

File(s)	Change Summary
src/event/format/known_schema.rs	Modified schema extraction to insert `FORMAT_VERIFY_KEY` with `"true"` or `"false"` into custom fields instead of returning errors. Updated `extract_from_inline_log` signature to accept a mutable custom fields map.
src/event/mod.rs	Added new public constant `FORMAT_VERIFY_KEY` for schema verification status.
src/handlers/http/ingest.rs	Updated ingestion functions to extract, propagate, and update mutable custom fields throughout the pipeline. Unified header extraction via reference.
src/handlers/http/modal/utils/ingest_utils.rs	Changed `get_custom_fields_from_header` to accept a reference to `HttpRequest`. Updated tests accordingly.
resources/formats.json	Modified regex pattern for the "alb_log" format, broadening matching criteria for `user_agent`, `ssl_cipher`, and `ssl_protocol` fields; added two new alternative regex patterns for "alb_log".

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTPHandler
    participant EventProcessor
    participant SchemaDefinition

    Client->>HTTPHandler: Send event with headers
    HTTPHandler->>HTTPHandler: Extract custom fields via get_custom_fields_from_header(&req)
    HTTPHandler->>EventProcessor: extract_from_inline_log(json, &mut p_custom_fields, ...)
    EventProcessor->>SchemaDefinition: check_or_extract(...)
    alt Pattern matches
        SchemaDefinition->>EventProcessor: Return extracted fields
        EventProcessor->>p_custom_fields: Insert FORMAT_VERIFY_KEY = "true"
    else Pattern does not match
        SchemaDefinition->>EventProcessor: Extraction fails
        EventProcessor->>p_custom_fields: Insert FORMAT_VERIFY_KEY = "false"
    end
    HTTPHandler->>HTTPHandler: flatten_and_push_logs(..., p_custom_fields)

Possibly related PRs

fix: add custom fields for pmeta stream #1280: Both PRs modify src/handlers/http/ingest.rs to enhance use of a mutable p_custom_fields map for passing custom fields during event ingestion.
feat: add custom fields to events #1228: Related through enhancements in handling custom fields from HTTP headers and schema verification keys in event processing.
feat: detect and extract json from log-lines #1248: The current PR extends the extract_from_inline_log method and schema extraction logic originally introduced in this PR.

Suggested labels

for next release

Suggested reviewers

de-sh

Poem

A bunny hopped through fields of code,
With custom keys in every load.
No more errors, just a sign,
"Verified!" or "Not this time."
Fields are checked and marked with care,
Now every log is well aware! 🐇
"""

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f217b3d and 226bb5f.

📒 Files selected for processing (1)

resources/formats.json (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

resources/formats.json

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: coverage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/event/format/known_schema.rs (1)

125-224: Suggest updating unit tests for verification status

The test cases should be updated to verify that the verification status is correctly added to both the JSON objects and the custom fields map.

You could add assertions to check for the presence and value of the p_format_verified field in your existing test cases, particularly in tests like test_apache_log_extraction, test_no_match, and test_extract_from_inline_log_object.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49ea3bd and 0a3ae7a.

📒 Files selected for processing (4)

src/event/format/known_schema.rs (4 hunks)
src/event/mod.rs (1 hunks)
src/handlers/http/ingest.rs (7 hunks)
src/handlers/http/modal/utils/ingest_utils.rs (5 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/handlers/http/ingest.rs (1)

src/handlers/http/modal/utils/ingest_utils.rs (1)

get_custom_fields_from_header (148-206)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: coverage

🔇 Additional comments (16)

src/event/mod.rs (1)

41-41: Clean implementation of the verification key constant

The new constant FORMAT_VERIFY_KEY with value "p_format_verified" is well-named and properly aligned with other event-related key constants in this file. This is a good pattern for centralizing the constant definition to avoid string duplication.

src/handlers/http/modal/utils/ingest_utils.rs (2)

148-148: Good API improvement: Using reference instead of ownership

Modifying the function to accept a reference to HttpRequest rather than taking ownership is a better practice since the function doesn't need to consume the request.

220-220: Consistently updated test cases

All test calls have been properly updated to pass references instead of moving ownership of the request, maintaining consistency with the function signature change.

Also applies to: 233-233, 246-246, 258-258

src/handlers/http/ingest.rs (7)

84-85: Good approach to initialize custom fields early

Initializing the mutable p_custom_fields at this point allows the verification status to be tracked throughout the ingestion process. The mutability is necessary as the format verification will update this map.

87-92: Consistent parameter passing for schema verification

The updated parameter passing to extract_from_inline_log correctly passes a mutable reference to p_custom_fields, enabling the verification status to be updated during schema extraction.

206-206: Unified parameter passing in OTEL logs handler

The get_custom_fields_from_header call now consistently passes a reference to the request, matching the updated function signature.

264-264: Unified parameter passing in OTEL metrics handler

Same improvement as in the logs handler, maintaining consistency across all ingestion handlers.

323-323: Unified parameter passing in OTEL traces handler

Same improvement as in the other handlers, completing the consistent pattern across all ingestion routes.

371-371: Mutable custom fields in post_event function

The custom fields map is properly initialized as mutable to support the format verification updates.

377-382: Improved error handling approach

Instead of returning errors when schema extraction fails, the code now marks the verification status in the custom fields map. This allows the ingestion pipeline to continue processing events even when format extraction fails.

src/event/format/known_schema.rs (6)

27-27: Properly imported verification key constant

Using the constant from the event module maintains consistency and centralizes the definition.

125-130: Schema verification success tracking

When pattern matching succeeds, the code correctly adds the verification key with value "true" to mark successful format verification.

135-139: Schema verification failure tracking

When pattern matching fails, the code properly adds the verification key with value "false" to indicate format verification failure.

195-195: Updated function signature to support verification

The extract_from_inline_log method now accepts a mutable reference to the custom fields map, allowing it to update the map with verification status.

213-215: Format verification for array events

For events in an array where schema extraction fails, the verification status is appropriately recorded in the custom fields map.

222-224: Format verification for object events

Similar to array events, object events that fail schema extraction are also properly marked with the verification status.

if server detects known format, add `p_format_verified=true` to the event if schema detection fails, add `p_format_verified=false` to the event

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

resources/formats.json (1)

113-149: Second alb_log alternative pattern is valid; consider simplifying alternations
This new, more permissive pattern broadens support for missing or variable-length fields and constrains track_id to the TID_[a-f0-9]+ format. The fields array matches the named groups exactly. One optional refactor: the alternation [^ ]+|[^ ]* can be simplified to [^ ]* without changing semantics, reducing regex verbosity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1675057 and f217b3d.

📒 Files selected for processing (1)

resources/formats.json (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: coverage
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Quest Smoke and Load Tests for Distributed deployments

🔇 Additional comments (1)

resources/formats.json (1)

75-112: First alb_log regex refinement looks correct
The updated pattern correctly captures client and target IP/port pairs, introduces non-greedy user_agent matching, and aligns perfectly with the fields list. The named groups and order are consistent, and the anchoring remains proper.

coderabbitai bot reviewed Apr 25, 2025

View reviewed changes

coderabbitai bot previously approved these changes Apr 25, 2025

View reviewed changes

nikhilsinhaparseable added 2 commits April 26, 2025 02:34

update: add p_format_verified for schema detection

7f65217

if server detects known format, add `p_format_verified=true` to the event if schema detection fails, add `p_format_verified=false` to the event

update regex for alb-log

1675057

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 1675057 April 26, 2025 06:35

nikhilsinhaparseable force-pushed the schema-unverified branch from 9a83fd0 to 1675057 Compare April 26, 2025 06:36

coderabbitai bot previously approved these changes Apr 26, 2025

View reviewed changes

add regex for alb-log

f217b3d

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via f217b3d April 26, 2025 08:53

coderabbitai bot reviewed Apr 26, 2025

View reviewed changes

coderabbitai bot previously approved these changes Apr 26, 2025

View reviewed changes

add regex for alb-log

226bb5f

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 226bb5f April 26, 2025 13:13

coderabbitai bot approved these changes Apr 26, 2025

View reviewed changes

nitisht merged commit a90ed13 into parseablehq:main Apr 26, 2025
14 checks passed

nikhilsinhaparseable deleted the schema-unverified branch April 27, 2025 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update: add custom field p_format_verified#1303

update: add custom field p_format_verified#1303
nitisht merged 4 commits intoparseablehq:mainfrom
nikhilsinhaparseable:schema-unverified

nikhilsinhaparseable commented Apr 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 25, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nikhilsinhaparseable commented Apr 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikhilsinhaparseable commented Apr 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)