-
Notifications
You must be signed in to change notification settings - Fork 1.8k
docker: add arrow/parquet support #11315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds an Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
dockerfiles/Dockerfile (1)
137-139: DRY concern: Arrow repository setup is duplicated across stages.The Arrow apt repository setup logic (curl the deb, install it, update) is repeated in
builder-base,deb-extractor, anddebugstages (lines 61-67, 146-152, 303-309). Consider extracting this into a shared script or documenting why duplication is necessary (e.g., multi-stage isolation requirements).Also applies to: 144-152
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dockerfiles/Dockerfiledockerfiles/Dockerfile.windowsdockerfiles/README.md
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
dockerfiles/Dockerfiledockerfiles/README.md
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
dockerfiles/Dockerfiledockerfiles/README.md
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
dockerfiles/Dockerfile.windows
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
dockerfiles/README.md
🪛 LanguageTool
dockerfiles/README.md
[style] ~117-~117: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...ectures staging images used for testing prior to release - `ghcr.io/fluent/fluent-bit/ma...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
[style] ~156-~156: Consider using a less common alternative to make your writing sound more unique and professional.
Context: ...Arrow with static linking. ## Contact Feel free to join us on our Mailing List or IRC: - ...
(FEEL_FREE_TO_STYLE_ME)
🪛 markdownlint-cli2 (0.18.1)
dockerfiles/README.md
25-25: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
30-30: Bare URL used
(MD034, no-bare-urls)
32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
38-38: Bare URL used
(MD034, no-bare-urls)
40-40: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
59-59: Dollar signs used before commands without showing output
(MD014, commands-show-output)
66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
159-159: Bare URL used
(MD034, no-bare-urls)
160-160: Bare URL used
(MD034, no-bare-urls)
162-162: Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (8)
dockerfiles/README.md (2)
62-72: Good documentation for the optional Parquet feature.The instructions for enabling Parquet encoder support are clear and include the useful note about image size impact. Consider adding a language specifier to the code block for syntax highlighting consistency:
-``` +```bash $ docker build -t fluent/fluent-bit --target=production \[approve_code_changes, suggest_optional_refactor]
142-154: Windows Parquet documentation is consistent with Linux instructions.The Windows-specific section appropriately mirrors the Linux documentation while noting the vcpkg/static linking approach. Consider adding the
powershelllanguage specifier to the code block:-```powershell +```powershell docker build --no-cache `(Note: The block already has
powershell- this is correct. LGTM.)dockerfiles/Dockerfile.windows (2)
160-168: Conditional Arrow installation logic is well-structured.The approach to conditionally install Arrow/Parquet only when explicitly requested via
EXTRA_CMAKE_FLAGSkeeps the default image lean. The cleanup steps after installation help reduce layer size.
197-210: CMake invocation correctly includes Arrow paths and extra flags.The
%ARROW_PATHS%and%EXTRA_CMAKE_FLAGS%variables are properly interpolated in the cmd shell context. When Parquet is not enabled,ARROW_PATHSwill be empty, which is safe as CMake ignores empty arguments.dockerfiles/Dockerfile (4)
61-68: Conditional Arrow/Parquet installation logic is sound.The approach using
grep -qto check for the Parquet flag is straightforward and effective. The installation fetches the Apache Arrow apt source dynamically based on the distribution, which is a good practice for portability.
172-172: Good addition of libsnappy1v5 for Parquet/Arrow support.Snappy is a compression library used by Parquet. Adding it to the runtime dependencies ensures Parquet files using Snappy compression can be read.
83-84: EXTRA_CMAKE_FLAGS correctly propagated to CMake invocation.The ARG/ENV pattern and usage in the cmake command via
$EXTRA_CMAKE_FLAGS(line 108) follows the established pattern in this Dockerfile. The hadolint ignore for SC2086 at line 91 correctly documents that word splitting is intentional for multiple flags.Also applies to: 108-108
268-271: Debug image correctly includes runtime Arrow libraries when Parquet is enabled.The debug stage appropriately installs runtime libraries (
libarrow2200,libparquet2200) rather than development libraries, which is correct for a debug/runtime image. The conditional logic mirrors the other stages.Also applies to: 303-310
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dockerfiles/Dockerfiledockerfiles/Dockerfile.windows
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
dockerfiles/Dockerfiledockerfiles/Dockerfile.windows
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
dockerfiles/Dockerfile
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
dockerfiles/Dockerfile
🔇 Additional comments (8)
dockerfiles/Dockerfile.windows (3)
19-20: LGTM!The
EXTRA_CMAKE_FLAGSbuild argument is correctly declared with an empty default value in the builder-base stage. This allows optional feature flags to be passed during build time.
160-168: LGTM!The conditional Arrow installation logic is well-structured. The
-likeoperator with the wildcard pattern correctly performs case-insensitive matching for the CMake flag, and the cleanup of temporary files helps reduce image size.
183-204: LGTM!The builder stage correctly:
- Re-declares
EXTRA_CMAKE_FLAGS(required for multi-stage builds to inherit the value)- Uses
findstrin cmd shell to detect the Parquet encoder flag- Conditionally sets Arrow/Parquet directory paths for CMake
- The
|| cd .idiom correctly acts as a no-op whenfindstrfails (flag not present)dockerfiles/Dockerfile (5)
24-26: LGTM!The
EXTRA_CMAKE_FLAGSargument and environment variable are correctly declared in the builder-base stage, enabling optional features like Parquet to be toggled at build time.
137-138: LGTM!The deb-extractor stage correctly re-declares
EXTRA_CMAKE_FLAGSand conditionally sets up the Arrow apt source before downloading packages. This ensures the Arrow/Parquet runtime packages are available when the feature is enabled.Also applies to: 144-152
172-172: LGTM!Adding
libsnappy1v5is correct—Snappy compression is commonly used by Parquet for data compression.
194-205: LGTM!The maintainer comment documenting the Arrow SONAME versioning scheme (major100 + minor10) is helpful for future maintenance. The conditional download of Arrow runtime packages and their dependencies is correctly structured.
270-273: LGTM!The
EXTRA_CMAKE_FLAGSis correctly re-declared in the debug stage to enable conditional Arrow/Parquet package installation.
470ad07 to
0536f4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
dockerfiles/Dockerfile (3)
61-67: The "latest" apt source creates non-deterministic builds.Using
apache-arrow-apt-source-latest-trixie.debmakes builds non-reproducible. Apache Arrow's APT repository is a rolling feed with known metadata and checksum changes. Consider pinning to a specific Arrow version (e.g.,apache-arrow-apt-source-18.0.0-trixie.deb) and explicitly versioning the-devpackages (e.g.,libarrow-dev=18.0.0-1).Note: This concern was previously raised but remains unaddressed in the current implementation.
145-151: Same "latest" apt source issue in deb-extractor stage.This stage also uses
apache-arrow-apt-source-latest-trixie.deb, creating the same non-deterministic build concern. Ensure the Arrow apt source is pinned to a specific version here as well to maintain consistency.
304-318: "Latest" apt source issue persists in debug stage; package consistency is good.The debug stage also uses
apache-arrow-apt-source-latest-trixie.deb(same non-deterministic concern as other stages). However, the maintainer note correctly reminds to keep packages synchronized with deb-extractor, and the package list (lines 310-316) matches the deb-extractor list (lines 197-203), ensuring consistency.
🧹 Nitpick comments (1)
dockerfiles/Dockerfile (1)
171-171: Move libsnappy1v5 to the conditional Parquet block.
libsnappy1v5is a Parquet-specific dependency and is downloaded unconditionally, adding unnecessary bloat when Parquet support is disabled. Move it to the conditional block at lines 195-204 alongside the other Arrow/Parquet packages.🔎 Suggested adjustment
Remove from line 171:
libzstd1 \ liblz4-1 \ - libsnappy1v5 \ libgssapi-krb5-2 \Add to the conditional block:
&& if echo "${EXTRA_CMAKE_FLAGS}" | grep -q "FLB_PARQUET_ENCODER=On"; then \ apt-get download \ libarrow2200 \ libparquet2200 \ + libsnappy1v5 \ libabsl20240722 \
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dockerfiles/Dockerfile
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
dockerfiles/Dockerfile
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
dockerfiles/Dockerfile
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
dockerfiles/Dockerfile
🔇 Additional comments (4)
dockerfiles/Dockerfile (4)
24-26: LGTM! Clear documentation of the build argument.The ARG/ENV pattern correctly propagates the build-time flag into the environment for use in conditional logic and child stages.
136-138: LGTM! Consistent propagation across stages.The ARG/ENV pattern is correctly repeated in the deb-extractor stage to enable conditional package selection.
193-204: Well-documented maintainer guidance.The maintainer note clearly explains the Arrow SONAME versioning scheme and when to update, addressing previous review feedback. The dependency list (abseil, protobuf, thrift, xml2) aligns with typical Arrow runtime requirements.
269-271: LGTM! Consistent across all stages.The ARG/ENV pattern is correctly applied to the debug stage for conditional Arrow/Parquet installation.
|
@patrick-stephens for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
dockerfiles/Dockerfile (2)
145-151: Arrow apt source version consistency concern.Similar to the builder-base stage (lines 61-67), using
apache-arrow-apt-source-latest-trixie.debhere makes the runtime package selection non-deterministic. Consider using the same pinned Arrow version across both build and runtime stages to ensure ABI compatibility between build-time and runtime Arrow libraries.
171-171: Consider making libsnappy1v5 conditional on Parquet support.The
libsnappy1v5package is added unconditionally but is primarily needed for Parquet/Arrow compression. To minimize dependencies in the base image, consider moving this to the conditional block at lines 195-204 where other Arrow-related packages are downloaded.🔎 Proposed change
liblzma5 \ libyaml-0-2 \ libcap2 \ libldap2 \ - libsnappy1v5 \ # MAINTAINER: Arrow SONAME format: libarrow{major*100+minor*10} (e.g., 22.0.0 -> libarrow2200) # Only update when upgrading Arrow version or if packages become unavailable in apt source. && if echo "${EXTRA_CMAKE_FLAGS}" | grep -q "FLB_PARQUET_ENCODER=On"; then \ apt-get download \ libarrow2200 \ libparquet2200 \ + libsnappy1v5 \ libabsl20240722 \dockerfiles/README.md (2)
62-72: Documentation clearly explains Parquet support for Linux builds.The instructions accurately document the EXTRA_CMAKE_FLAGS usage and provide helpful information about the image size impact. Consider adding a language identifier to the code block for better syntax highlighting.
🔎 Optional: Add language identifier to code block
-``` +```bash $ docker build -t fluent/fluent-bit --target=production \ --build-arg EXTRA_CMAKE_FLAGS="-DFLB_PARQUET_ENCODER=On" \ -f dockerfiles/Dockerfile .
142-153: Windows Parquet documentation is clear and accurate.The PowerShell build command correctly demonstrates EXTRA_CMAKE_FLAGS usage for Windows, and the vcpkg note provides useful implementation context. Consider adding a language identifier to the code block.
🔎 Optional: Add language identifier to code block
-```powershell +```powershell docker build --no-cache `(Note: The code block already has
powershellspecified at line 146, so this is already correct. No change needed.)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dockerfiles/Dockerfiledockerfiles/Dockerfile.windowsdockerfiles/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- dockerfiles/Dockerfile.windows
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
dockerfiles/README.mddockerfiles/Dockerfile
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
dockerfiles/README.md
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
dockerfiles/README.mddockerfiles/Dockerfile
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
dockerfiles/Dockerfile
🪛 LanguageTool
dockerfiles/README.md
[style] ~117-~117: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...ectures staging images used for testing prior to release - `ghcr.io/fluent/fluent-bit/ma...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
[style] ~156-~156: Consider using a less common alternative to make your writing sound more unique and professional.
Context: ...Arrow with static linking. ## Contact Feel free to join us on our Mailing List or IRC: - ...
(FEEL_FREE_TO_STYLE_ME)
🪛 markdownlint-cli2 (0.18.1)
dockerfiles/README.md
25-25: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
30-30: Bare URL used
(MD034, no-bare-urls)
32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
38-38: Bare URL used
(MD034, no-bare-urls)
40-40: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
59-59: Dollar signs used before commands without showing output
(MD014, commands-show-output)
66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
159-159: Bare URL used
(MD034, no-bare-urls)
160-160: Bare URL used
(MD034, no-bare-urls)
162-162: Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (9)
dockerfiles/Dockerfile (7)
24-26: LGTM!The EXTRA_CMAKE_FLAGS build argument and environment variable are correctly declared, enabling optional features like Parquet support to be controlled at build time.
82-83: LGTM!The EXTRA_CMAKE_FLAGS is correctly redeclared in the builder stage, as Docker multi-stage builds require ARG redeclaration in each stage where it's used.
136-138: LGTM!Correctly redeclares EXTRA_CMAKE_FLAGS for use in the deb-extractor stage's conditional Arrow/Parquet package provisioning.
193-204: Good maintainer documentation for Arrow SONAME versioning.The maintainer comment clearly explains the Arrow SONAME naming convention and when updates are needed. This addresses the concern raised in earlier reviews about hardcoded package versions. The dependency list (libabsl20240722, libbz2-1.0, libprotobuf32t64, libthrift-0.19.0t64, libxml2) is comprehensive.
270-272: LGTM!Correctly redeclares EXTRA_CMAKE_FLAGS for the debug stage's conditional Arrow/Parquet installation.
304-318: Package list correctly synchronized with deb-extractor.The Arrow/Parquet package list matches the deb-extractor stage (lines 195-204), including all runtime dependencies. The maintainer comment helps ensure future updates maintain this synchronization. Note that the
apache-arrow-apt-source-latest-trixie.debusage here has the same reproducibility concern mentioned for earlier stages.
61-67: The proposed version pinning approach is not feasible; Apache Arrow does not publish versioned apt-source packages for Debian.The
apache-arrow-apt-source-latest-trixie.debpackage is the only available release from Apache Arrow for this distribution. This package simply configures the Arrow repository; individual packages likelibarrow-devandlibparquet-devare versioned separately within that repository. To achieve reproducible builds, pin specific Arrow versions at the install stage instead (e.g.,apt-get install -y libarrow-dev=14.0.0-1 libparquet-dev=14.0.0-1), or accept that this optional feature receives the latest available packages from the configured repository.Likely an incorrect or invalid review comment.
dockerfiles/README.md (2)
115-119: LGTM!Formatting change to use prefix bullets for consistency with other sections. No functional impact.
159-162: LGTM!Formatting update to contact section. The consistent prefix style improves readability.
Signed-off-by: Arbin <[email protected]>
0536f4e to
adbe478
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
dockerfiles/README.md (2)
66-70: Consider adding language identifier to the code block.The fenced code block at line 66 would benefit from a
bashlanguage identifier for improved syntax highlighting and consistency with Markdown best practices.🔎 Proposed fix
-``` +```bash $ docker build -t fluent/fluent-bit --target=production \ --build-arg EXTRA_CMAKE_FLAGS="-DFLB_PARQUET_ENCODER=On" \ -f dockerfiles/Dockerfile .
146-151: Consider adding language identifier to the code block.The fenced code block at line 146 would benefit from a
powershelllanguage identifier for improved syntax highlighting and consistency.🔎 Proposed fix
-```powershell +```powershell docker build --no-cache ` --build-arg WINDOWS_VERSION=ltsc2019 ` --build-arg EXTRA_CMAKE_FLAGS="-DFLB_PARQUET_ENCODER=On" ` -t fluent/fluent-bit:master-windows -f ./dockerfiles/Dockerfile.windows .Wait, looking at line 146, it already has
powershellidentifier. Let me re-check the annotated code... Line 146 shows:```powershellSo the language identifier IS present. This means static analysis is wrong or I misread. Let me skip this comment.
Actually, wait - let me reread the code more carefully. The code shows:
136~ ```powershell ...So lines 136-140 already have powershell identifier. Then lines 146-151 are new:146~
docker build --no-cache ` --build-arg WINDOWS_VERSION=ltsc2019 ` --build-arg EXTRA_CMAKE_FLAGS="-DFLB_PARQUET_ENCODER=On" ` -t fluent/fluent-bit:master-windows -f ./dockerfiles/Dockerfile.windows .So line 146 DOES have the powershell identifier. No issue here. Skip this comment. [scratchpad_end] --> <!-- [scratchpad_start] [task review_file_1/3 segment_8: lines 159-162] Review contact section formatting. [observations] - Lines 159-162 have tilde markers indicating changes. - These are formatting/whitespace changes to contact links. [analysis] - Cosmetic changes. - No functional impact. - Skip. [actions] Skip comment. [done] README.md review complete. [scratchpad_end] --> </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 0536f4e3c00fcd8a2c37a69d275bd68c323b2f0b and adbe4783d9a235081448b0612480f22b535e3385. </details> <details> <summary>📒 Files selected for processing (3)</summary> * `dockerfiles/Dockerfile` * `dockerfiles/Dockerfile.windows` * `dockerfiles/README.md` </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (11)</summary> <details> <summary>📓 Common learnings</summary>Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka isFLB_PREFER_SYSTEM_LIB_KAFKA=ON.Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.</details> <details> <summary>📚 Learning: 2025-08-07T10:15:46.187Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.**Applied to files:** - `dockerfiles/Dockerfile` - `dockerfiles/Dockerfile.windows` </details> <details> <summary>📚 Learning: 2025-08-31T12:46:11.940Z</summary>Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka isFLB_PREFER_SYSTEM_LIB_KAFKA=ON.**Applied to files:** - `dockerfiles/Dockerfile` - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-08-31T12:46:11.940Z</summary>Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.**Applied to files:** - `dockerfiles/Dockerfile` - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-09-14T09:46:09.531Z</summary>Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.**Applied to files:** - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-09-04T07:28:37.083Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.**Applied to files:** - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-08-29T06:24:44.797Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.**Applied to files:** - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-08-29T06:25:27.250Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like#ifdef FLB_HAVE_ZSTD. Unlike some other optional components such as ARROW/PARQUET (which use#ifdef FLB_HAVE_ARROWguards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.**Applied to files:** - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-08-29T06:24:55.855Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.**Applied to files:** - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-09-08T11:21:33.975Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.**Applied to files:** - `dockerfiles/README.md` </details> <details> <summary>📚 Learning: 2025-08-29T06:25:27.250Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like#ifdef FLB_HAVE_ZSTD. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.**Applied to files:** - `dockerfiles/README.md` </details> </details><details> <summary>🪛 LanguageTool</summary> <details> <summary>dockerfiles/README.md</summary> [style] ~117-~117: ‘prior to’ might be wordy. Consider a shorter alternative. Context: ...ectures staging images used for testing prior to release - `ghcr.io/fluent/fluent-bit/ma... (EN_WORDINESS_PREMIUM_PRIOR_TO) --- [style] ~156-~156: Consider using a less common alternative to make your writing sound more unique and professional. Context: ...Arrow with static linking. ## Contact Feel free to join us on our Mailing List or IRC: - ... (FEEL_FREE_TO_STYLE_ME) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>dockerfiles/README.md</summary> 25-25: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 30-30: Bare URL used (MD034, no-bare-urls) --- 32-32: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 38-38: Bare URL used (MD034, no-bare-urls) --- 40-40: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 58-58: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 59-59: Dollar signs used before commands without showing output (MD014, commands-show-output) --- 66-66: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 159-159: Bare URL used (MD034, no-bare-urls) --- 160-160: Bare URL used (MD034, no-bare-urls) --- 162-162: Bare URL used (MD034, no-bare-urls) </details> </details> </details> <details> <summary>🔇 Additional comments (10)</summary><blockquote> <details> <summary>dockerfiles/README.md (1)</summary><blockquote> `115-119`: **LGTM—improved bullet formatting.** The consistent bullet formatting enhances readability and maintains a cleaner structure for the ghcr.io topology documentation. </blockquote></details> <details> <summary>dockerfiles/Dockerfile.windows (3)</summary><blockquote> `19-21`: **LGTM—unified build argument added correctly.** The `EXTRA_CMAKE_FLAGS` build argument is properly declared with a clear comment and sensible default value, enabling optional feature configuration. --- `160-169`: **LGTM—conditional Arrow/Parquet installation implemented correctly.** The conditional vcpkg installation uses appropriate PowerShell pattern matching and follows best practices with cleanup steps. The `-like` operator is case-insensitive, which correctly handles CMake flag variations. --- `186-204`: **LGTM—CMake configuration correctly derives Arrow/Parquet paths.** The conditional path derivation correctly uses cmd shell environment variables within a single RUN instruction. The `findstr` pattern matching and `|| cd .` fallback ensure the build chain succeeds whether or not Parquet support is enabled. The vcpkg package paths for Arrow and Parquet are correctly specified. </blockquote></details> <details> <summary>dockerfiles/Dockerfile (6)</summary><blockquote> `24-27`: **LGTM—build argument properly declared.** The `EXTRA_CMAKE_FLAGS` argument is correctly declared as both ARG and ENV, with a clear comment explaining its purpose. --- `61-67`: **LGTM—conditional Arrow package installation implemented correctly.** The conditional logic correctly checks for the Parquet encoder flag and installs the necessary development packages. The apt source is added, updated, and cleaned up appropriately. --- `82-83`: **LGTM—build argument correctly re-declared for builder stage.** Multi-stage Dockerfile builds require ARG re-declaration in each stage that needs the value. This is correctly implemented. --- `136-138`: **LGTM—build argument correctly declared in deb-extractor stage.** Consistent with the other stages, the build argument is properly declared for the package extraction stage. --- `192-204`: **LGTM—maintainer notes and conditional package downloads implemented correctly.** The maintainer comment clearly documents the Arrow SONAME versioning scheme, addressing the maintenance concern from previous reviews. The conditional package downloads include all necessary Arrow/Parquet runtime dependencies. --- `304-319`: **LGTM—debug stage Arrow packages correctly synchronized.** The conditional Arrow package installation includes all dependencies from the deb-extractor stage, ensuring consistency between the production and debug images. The maintainer note provides clear guidance for future maintenance. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Description
Add optional Apache Arrow/Parquet encoder support to Docker images via unified
EXTRA_CMAKE_FLAGSbuild argument.Changes
Usage
Testing
Example configuration: See above usage section
Build test output:
+49MB (172MB vs 123MB)Valgrind: N/A (Docker packaging change only)
Packaging test:
Documentation
Backporting
N/A - New optional feature
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.