caught signal (SEVSEGV) in EKS 1.30 #880

rsantacreu-ust · 2024-12-16T12:37:30Z

Describe the question/issue

Where experiencing several restarts on aws-for-fluentbit component on eks

Configuration

Name: aws-for-fluent-bit-pro-ms
Namespace: monitoring
Labels: app.kubernetes.io/instance=aws-for-fluent-bit-pro-ms
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-for-fluent-bit
app.kubernetes.io/version=2.32.2.20240516
argocd.argoproj.io/instance=aws-for-fluent-bit-pro-ms
helm.sh/chart=aws-for-fluent-bit-0.1.34
Annotations:

Data

fluent-bit.conf:

[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Health_Check On
HC_Errors_Count 5
HC_Retry_Failure_Count 5
HC_Period 5

Parsers_File /fluent-bit/parsers/parsers.conf

[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/.log
DB /var/log/flb_kube.db
multiline.parser docker, cri
Mem_Buf_Limit 100MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Merge_Log_Key data
Keep_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Buffer_Size 128k
[OUTPUT]
Name cloudwatch_logs
Match *
region eu-south-2
log_group_name /mylogroupname
log_stream_prefix fluentbit-
auto_create_group true
log_retention_days 7
[OUTPUT]
Name s3
Match *
bucket my-bucket
region eu-south-2
json_date_key date
json_date_format iso8601
total_file_size 100M
upload_chunk_size 6M
upload_timeout 10m
store_dir /tmp/fluent-bit/s3
s3_key_format /prod/pro-ms-001/$TAG/%Y-%m-%d/%H-%M-%S

auto_retry_requests           true
preserve_data_ordering        true
retry_limit                   1
external_id                   095761336548

BinaryData

Events:

Fluent Bit Log Output

[2024/12/16 12:23:55] [engine] caught signal (SIGSEGV)
[2024/12/16 12:23:54] [ info] [output:s3:s3.1] Successfully uploaded part[2024/12/16 12:23:54] [ info] [output:s3:s3.1] UploadPart http status=200
[2024/12/16 12:23:53] [ info] [output:s3:s3.1] Successfully uploaded object /mylogroup/kube.var.log.containers.

Fluent Bit Version Info

Cluster Details

Is throttling from the destination part of the problem? Please note that occasional transient network connection errors are often caused by exceeding limits. For example, CW API can block/drop Fluent Bit connections when throttling is triggered. Checked GetLogEvents and PutLogEvents and we are far away from the limit from AWS

EKS
EC2
DaemonSet

Application Details

Using helm chart 0.34

https://github.com/aws/eks-charts/tree/master/stable/aws-for-fluent-bit

Steps to reproduce issue

Related Issues

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caught signal (SEVSEGV) in EKS 1.30 #880

caught signal (SEVSEGV) in EKS 1.30 #880

rsantacreu-ust commented Dec 16, 2024

caught signal (SEVSEGV) in EKS 1.30 #880

caught signal (SEVSEGV) in EKS 1.30 #880

Comments

rsantacreu-ust commented Dec 16, 2024

Describe the question/issue

Configuration

Data

fluent-bit.conf:

BinaryData

Fluent Bit Log Output

Fluent Bit Version Info

Cluster Details

Application Details

Steps to reproduce issue

Related Issues