Skip to content

Conversation

tomekwilk
Copy link

@tomekwilk tomekwilk commented Jan 2, 2025

This PR is based on PR #3668 but addresses Azure blob storage. The azure_blob plugin was modify to accept 'log_key' option. By default the entire log record is sent to storage. When 'log_key' option is specified in the output plugin configuration, then only the value of the key is sent to the storage blob.

Addresses #9721

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • Documentation required for this feature

Doc PR fluent/fluent-bit-docs#1540


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

By default the entire record is sent to azure blob storage. Here is an example of a sample configuration and default output

Configuration

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      dummy
    dummy     {"name": "Fluent Bit", "year": 2020}
    samples   1
    tag       var.log.containers.app-default-96cbdef2340.log

[OUTPUT]
    name                  azure_blob
    match                 *
    account_name          twilk123
    shared_key            <snip>
    path                  kubernetes
    container_name        test-container
    auto_create_container on
    tls                   on

Record without log_key
{"@timestamp":"2025-01-02T16:56:02.906357Z","name":"Fluent Bit","year":2020}

if the 'log_key' is specified then only the specific key value is sent to azure blob storage

Sample configuration with log_key

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      dummy
    dummy     {"name": "Fluent Bit", "year": 2020}
    samples   1
    tag       var.log.containers.app-default-96cbdef2340.log

[OUTPUT]
    name                  azure_blob
    match                 *
    account_name          twilk123
    shared_key            <snip>
    path                  kubernetes
    container_name        test-container
    auto_create_container on
    tls                   on
    log_key               name

Record with log_key set to name
Fluent Bit

Example Valgrind output

root@fluent-bit:/tmp# valgrind ./fluent-bit -c azure.conf
==3022== Memcheck, a memory error detector
==3022== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==3022== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==3022== Command: ./fluent-bit -c azure.conf
==3022==
Fluent Bit v3.2.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  _____
|  ___| |                | |   | ___ (_) |         |____ |/ __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \  / /
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)_____/


[2025/01/02 19:56:50] [ info] [fluent bit] version=3.2.3, commit=addf261e8c, pid=3022
[2025/01/02 19:56:50] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/01/02 19:56:50] [ info] [simd    ] disabled
[2025/01/02 19:56:50] [ info] [cmetrics] version=0.9.9
[2025/01/02 19:56:50] [ info] [ctraces ] version=0.5.7
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/01/02 19:56:50] [ info] [input:dummy:dummy.0] initializing
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] worker #0 started
[2025/01/02 19:56:50] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] account_name=twilk123, container_name=test-container, blob_type=appendblob, emulator_mode=no, endpoint=twilk123.blob.core.windows.net, auth_type=key
[2025/01/02 19:56:51] [ info] [sp] stream processor started
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] container 'test-container' already exists
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] content uploaded successfully:
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] blob id (null) committed successfully
^C[2025/01/02 19:57:03] [engine] caught signal (SIGINT)
[2025/01/02 19:57:03] [ warn] [engine] service will shutdown in max 5 seconds
[2025/01/02 19:57:03] [ info] [input] pausing dummy.0
[2025/01/02 19:57:03] [ info] [engine] service has stopped (0 pending tasks)
[2025/01/02 19:57:03] [ info] [input] pausing dummy.0
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopping...
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopped
==3022==
==3022== HEAP SUMMARY:
==3022==     in use at exit: 0 bytes in 0 blocks
==3022==   total heap usage: 17,894 allocs, 17,894 frees, 2,471,158 bytes allocated
==3022==
==3022== All heap blocks were freed -- no leaks are possible
==3022==
==3022== For lists of detected and suppressed errors, rerun with: -s
==3022== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Addresses #9721

Summary by CodeRabbit

  • New Features
    • Added a new "log_key" configuration to upload only a specific field from each log record.
    • Supports string, integer, and float values; missing or unsupported keys cause the record to be skipped with safe error handling.
    • If "log_key" is not set, behavior is unchanged — logs continue to be sent as JSON lines.

@adrinaula
Copy link

@edsiper Can you please give us an update?

@tomekwilk
Copy link
Author

memory leak test after rewrite:

$ valgrind build/bin/fluent-bit -c fluentbit.cfg
==225827== Memcheck, a memory error detector
==225827== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==225827== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==225827== Command: build/bin/fluent-bit -c fluentbit.cfg
==225827==
Fluent Bit v4.0.3
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/06/11 14:22:02] [ info] [fluent bit] version=4.0.3, commit=97285bdd2a, pid=225827
[2025/06/11 14:22:03] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/06/11 14:22:03] [ info] [simd    ] disabled
[2025/06/11 14:22:03] [ info] [cmetrics] version=1.0.2
[2025/06/11 14:22:03] [ info] [ctraces ] version=0.6.6
[2025/06/11 14:22:03] [ info] [input:dummy:dummy.0] initializing
[2025/06/11 14:22:03] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] account_name=devstoreaccount1, container_name=logs, blob_type=appendblob, emulator_mode=yes, endpoint=http://127.0.0.1
:10000, auth_type=key
[2025/06/11 14:22:03] [ info] [sp] stream processor started
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] worker #0 started
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] container 'logs' already exists
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] content uploaded successfully:
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] blob id (null) committed successfully
^C[2025/06/11 14:22:18] [engine] caught signal (SIGINT)
[2025/06/11 14:22:18] [ warn] [engine] service will shutdown in max 5 seconds
[2025/06/11 14:22:18] [ info] [input] pausing dummy.0
[2025/06/11 14:22:18] [ info] [engine] service has stopped (0 pending tasks)
[2025/06/11 14:22:18] [ info] [input] pausing dummy.0
[2025/06/11 14:22:18] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopping...
[2025/06/11 14:22:18] [ info] [output:azure_blob:azure_blob.0] initializing worker
==225827==
==225827== HEAP SUMMARY:
==225827==     in use at exit: 0 bytes in 0 blocks
==225827==   total heap usage: 7,292 allocs, 7,292 frees, 1,413,601 bytes allocated
==225827==
==225827== All heap blocks were freed -- no leaks are possible
==225827==
==225827== For lists of detected and suppressed errors, rerun with: -s
==225827== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@khalillilahk
Copy link

Hi @tomekwilk @lockewritesdocs,
Just checking in, are there any blockers preventing this PR from being merged?
Let me know if there's anything I can do to help move it forward.

@coderabbitai
Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Adds optional msgpack log_key extraction to send only a single field value, updates formatter/control-flow and function signature to support this, exposes a log_key config option and struct field, includes record accessor headers, and cleans up log_key on context destroy.

Changes

Cohort / File(s) Summary
Formatter, extraction and API
plugins/out_azure_blob/azure_blob.c
Added cb_azb_msgpack_extract_log_key(...) to extract a single field via record accessor; updated azure_blob_format(...) signature and call path to accept flush/context/event metadata and return out_data/out_size; conditional path: use log_key extraction when set, otherwise JSON lines; added includes flb_record_accessor.h, flb_ra_key.h; added log_key config_map entries; minor whitespace adjustments.
Public struct change
plugins/out_azure_blob/azure_blob.h
Added flb_sds_t log_key to struct flb_azure_blob.
Config cleanup
plugins/out_azure_blob/azure_blob_conf.c
Free ctx->log_key in flb_azure_blob_conf_destroy (calls flb_sds_destroy and NULLs the pointer).

Sequence Diagram(s)

sequenceDiagram
    participant In as Input
    participant AZB as AzureBlob Plugin
    participant Fmt as Formatter
    participant AZ as Azure Blob Service

    In->>AZB: Flush event (msgpack, tag, bytes)
    AZB->>Fmt: azure_blob_format(config, ins, ctx, flush_ctx, event_type, tag, data, bytes)
    alt log_key configured
        Fmt->>Fmt: cb_azb_msgpack_extract_log_key -> locate field via record accessor
        Fmt->>Fmt: Convert value to string/number, produce out_data/out_size
        Note right of Fmt: Errors if missing/unsupported types
    else
        Fmt->>Fmt: Format record(s) as JSON lines -> out_data/out_size
    end
    Fmt-->>AZB: out_data, out_size
    AZB->>AZ: Upload formatted payload
    AZ-->>AZB: Response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • leonardo-albertovich
  • koleini
  • fujimotos

Poem

I hop through bytes with whiskers keen,
A single key now trims the scene;
If log_key calls, I fetch that prize—
One tidy line beneath the skies.
Otherwise I hum JSON tunes, and send to Azure by the moon. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "out_azure_blob: add log_key option" directly and accurately describes the primary objective of the changeset, which is to introduce a new configuration option called log_key to the azure_blob output plugin. The title is concise, specific, and uses a clear conventional format (plugin_name: description). It is entirely related to the main changes, which include adding the log_key field to the struct, implementing extraction functionality, updating configuration maps, and adding necessary cleanup. A developer reviewing the commit history would immediately understand that this change adds a new feature to the azure_blob plugin.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3339c5 and 1116ebc.

📒 Files selected for processing (3)
  • plugins/out_azure_blob/azure_blob.c (6 hunks)
  • plugins/out_azure_blob/azure_blob.h (1 hunks)
  • plugins/out_azure_blob/azure_blob_conf.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/out_azure_blob/azure_blob.h
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/out_azure_blob/azure_blob.c (4)
src/flb_record_accessor.c (3)
  • flb_ra_create (271-358)
  • flb_ra_get_value_object (803-814)
  • flb_ra_destroy (232-248)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_copy (260-281)
  • flb_sds_cat (120-141)
  • flb_sds_printf (336-387)
src/flb_ra_key.c (1)
  • flb_ra_key_value_destroy (842-851)
src/flb_pack.c (1)
  • flb_pack_msgpack_to_json_format (1169-1450)
plugins/out_azure_blob/azure_blob_conf.c (1)
src/flb_sds.c (1)
  • flb_sds_destroy (389-399)
🔇 Additional comments (4)
plugins/out_azure_blob/azure_blob_conf.c (1)

804-807: LGTM! Cleanup follows established patterns.

The log_key field cleanup is correctly implemented and consistent with other cleanup blocks in this function.

plugins/out_azure_blob/azure_blob.c (3)

34-35: LGTM! Required headers for record accessor functionality.

The includes are necessary to support the log_key extraction feature.


177-186: LGTM! Conditional logic correctly dispatches based on log_key configuration.

The formatting path properly selects the log_key extractor when configured, otherwise falls back to full JSON output. Note that the extractor itself has a critical data loss bug (flagged separately).


1897-1903: LGTM! Configuration entry is well-documented.

The log_key option is clearly described and properly mapped to the context structure.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tomekwilk
Copy link
Author

I rebased the PR to resolve the merge conflicts after recent master changes. This PR is waiting to be re-reviewed and merged. Not sure if there is anything else for me to do.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
plugins/out_azure_blob/azure_blob.c (2)

34-35: Good: using Record Accessor APIs.

Including flb_record_accessor.h and flb_ra_key.h aligns with prior guidance to avoid manual map walking.


70-75: Call flb_errno() before flb_plg_error() on RA creation failure.

Swap the calls so errno is captured before logging.

Apply:

-    if (!ra) {
-        flb_plg_error(ctx->ins, "invalid record accessor pattern '%s'", ctx->log_key);
-        flb_errno();
-        return NULL;
-    }
+    if (!ra) {
+        flb_errno();
+        flb_plg_error(ctx->ins, "invalid record accessor pattern '%s'", ctx->log_key);
+        return NULL;
+    }
🧹 Nitpick comments (2)
plugins/out_azure_blob/azure_blob.c (2)

177-186: Safer behavior: fallback to JSON when extraction yields no output.

Avoid dropping data if log_key is missing/unsupported; gracefully fall back.

Apply:

-    if (ctx->log_key) {
-        out_buf = cb_azb_msgpack_extract_log_key(ctx, data, bytes);
-    }
-    else {
+    if (ctx->log_key) {
+        out_buf = cb_azb_msgpack_extract_log_key(ctx, data, bytes);
+        if (!out_buf) {
+            flb_plg_warn(ctx->ins, "log_key='%s' yielded no data; falling back to JSON lines", ctx->log_key);
+        }
+    }
+    if (!out_buf) {
         out_buf = flb_pack_msgpack_to_json_format(data, bytes,
                                                   FLB_PACK_JSON_FORMAT_LINES,
                                                   FLB_PACK_JSON_DATE_ISO8601,
                                                   ctx->date_key,
                                                   config->json_escape_unicode);
     }

1897-1904: Clarify that log_key uses Record Accessor syntax.

Config text says “key name,” but code uses record accessor. Recommend noting RA path examples (e.g., log, kubernetes['labels']['app']) to set user expectations. Also document newline-delimited output when multiple records are present.

I can update the docs snippet accordingly if desired.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d985e8e and e3339c5.

📒 Files selected for processing (2)
  • plugins/out_azure_blob/azure_blob.c (6 hunks)
  • plugins/out_azure_blob/azure_blob.h (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/out_azure_blob/azure_blob.c (4)
src/flb_record_accessor.c (3)
  • flb_ra_create (271-358)
  • flb_ra_get_value_object (803-814)
  • flb_ra_destroy (232-248)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_copy (260-281)
  • flb_sds_cat (120-141)
  • flb_sds_printf (336-387)
src/flb_ra_key.c (1)
  • flb_ra_key_value_destroy (842-851)
src/flb_pack.c (1)
  • flb_pack_msgpack_to_json_format (1169-1450)

@SamerJ
Copy link

SamerJ commented Oct 15, 2025

Hello @edsiper , @adrinaula ,

This PR tackles an issue that we've also recently faced.
Any idea if there are anything preventing/blocking the merger?

Would be interested to contribute if need be :) .

Thanks in Advance,

@eschabell
Copy link

@tomekwilk Eduardo requested a change, can you take a look at fixing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants