Bug Report
Describe the bug
After upgrading from fluent-bit 4.2.2 to 5.0.7 (via the fluent/fluent-bit Helm chart, chart version 0.57.7) I am experiencing issue where after few hours some of fluent-bit containers restart with [engine] caught signal (SIGSEGV).
This keeps happening as long as tail input is in configuration, when tail input is removed containers stop crashing.
Error from container log:
[2026/06/15 08:58:01] [engine] caught signal (SIGSEGV)
#0 0x56035aa43e83 in flb_lib_worker() at src/flb_lib.c:909
#1 0x7f4adcebab7a in ???() at ???:0
#2 0x7f4adcf387f7 in ???() at ???:0
#3 0xffffffffffffffff in ???() at ???:0
This is happening across 3 kubernetes clusters.
There are 3 inputs configured - tail, http and kubernetes_events.
http is used for receiving MinIO AIStor audit logs.
I've tried to keep only 1 input on each cluster to see if any of them is causing error and found out that if I remove tail input, crashes don't occure anymore. If I keep it in configuration, first crash appears within cca 3 hours.
Usually it's only 1 container at the time and crashes repeat, but it's not always same container crashing and it's not happening in periodic intervals.
Messages in logfile of crashed fluent-bit containers preceeding SIGSEGV are also not same across containers.
For example state of fluent-bit pods from cluster A after 1 day of runing on version 5.0.7:
NAME READY STATUS RESTARTS AGE
logging-fluent-bit-4dj4c 1/1 Running 0 25h
logging-fluent-bit-6v26q 1/1 Running 0 25h
logging-fluent-bit-764x4 1/1 Running 0 25h
logging-fluent-bit-c47j4 1/1 Running 2 (19h ago) 25h
logging-fluent-bit-c646t 1/1 Running 0 25h
logging-fluent-bit-jq549 1/1 Running 1 (150m ago) 25h
logging-fluent-bit-zmk76 1/1 Running 5 (152m ago) 25h
...
...
I've tried to downgrade version-by-version while keeping tail input in configuration to find out which version introduces crash.
5.0.7 - crash of containers occures withing few hours, keeps repeating
5.0.6 - crash of containers occures withing few hours, keeps repeating
5.0.5 - NO crash, runs stable for over 24 hours
Expected behavior
I expect Fluent-bit containers not crash after update to the newest version. With same configuration we had fluent-bit running for months with older version with no crashes.
Your Environment
- Version used: 5.0.7 (crash), 5.0.6 (crash), 5.0.5 (no crash) deployed via fluent/fluent-bit helmchart
- k8s nodes version: v1.32.9 running with Garden Linux images
- Filters and plugins: kubernetes, lue, modify, grep, nest, record_modifier, parser, multiline, log_to_metrics
Configuration - here is configuration of inputs, I will also attach full non-default values.yaml used with helmchart deployments.
config:
service: |
[SERVICE]
Daemon Off
Flush 1
Grace 60
Log_Level info
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
storage.path /var/log/fluent-bit
storage.sync normal
storage.checksums off
storage.max_chunks_up 128
storage.backlog.mem_limit 10M
storage.metrics on
storage.delete_irrecoverable_chunks on
inputs: |
[INPUT]
Name tail
Tag_Regex (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
Tag k8s_containers.<namespace_name>.<container_name>.<pod_name>.<docker_id>-
Path /var/log/containers/*.log
Exclude_Path /var/log/containers/mailhog*
multiline.parser docker, cri
DB /var/log/flb_kube.db
Mem_Buf_Limit 4MB
Skip_Long_Lines On
Refresh_Interval 10
storage.type filesystem
storage.pause_on_chunks_overlimit On
[INPUT]
Name http
Listen 0.0.0.0
Port 9880
Tag cluster-A-minio-audit
[INPUT]
Name kubernetes_events
Tag k8s_events
Interval_Sec 10
Full values example: fluentbit_values.yaml
Bug Report
Describe the bug
After upgrading from fluent-bit 4.2.2 to 5.0.7 (via the fluent/fluent-bit Helm chart, chart version 0.57.7) I am experiencing issue where after few hours some of fluent-bit containers restart with
[engine] caught signal (SIGSEGV).This keeps happening as long as tail input is in configuration, when tail input is removed containers stop crashing.
Error from container log:
This is happening across 3 kubernetes clusters.
There are 3 inputs configured - tail, http and kubernetes_events.
http is used for receiving MinIO AIStor audit logs.
I've tried to keep only 1 input on each cluster to see if any of them is causing error and found out that if I remove tail input, crashes don't occure anymore. If I keep it in configuration, first crash appears within cca 3 hours.
Usually it's only 1 container at the time and crashes repeat, but it's not always same container crashing and it's not happening in periodic intervals.
Messages in logfile of crashed fluent-bit containers preceeding SIGSEGV are also not same across containers.
For example state of fluent-bit pods from cluster A after 1 day of runing on version 5.0.7:
I've tried to downgrade version-by-version while keeping tail input in configuration to find out which version introduces crash.
5.0.7 - crash of containers occures withing few hours, keeps repeating
5.0.6 - crash of containers occures withing few hours, keeps repeating
5.0.5 - NO crash, runs stable for over 24 hours
Expected behavior
I expect Fluent-bit containers not crash after update to the newest version. With same configuration we had fluent-bit running for months with older version with no crashes.
Your Environment
Configuration - here is configuration of inputs, I will also attach full non-default values.yaml used with helmchart deployments.
Full values example: fluentbit_values.yaml