Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka scaler: excludePersistentLag not working for partitions with invalid offset (that is -1) #5906

Closed
rakesh-ism opened this issue Jun 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@rakesh-ism
Copy link

Report

When I set excludePersistentLag flag to true, it does not excludes persistent lags for the partition with invalid offset (that is -1).

Expected Behavior

If the lags for these partitions with invalid offset is persistent, it should be ignored in the custom metrics.

Actual Behavior

In my set up, this resulted in no of kafka consumer pods to set to max.

Steps to Reproduce the Problem

In the setup, you should have some partitions that has not been consumed till now. (Invalid offset)
Old messages are not in kafka buffer as they are already expired.

Use kafka scalar to scale the consumer deployment.

Logs from KEDA operator

example

KEDA Version

2.14.0

Kubernetes Version

None

Platform

Amazon Web Services

Scaler Details

Kafka

Anything else?

Hi
Refer to #5274

The issue got reproduced in my system again. I did some analysis by adding some debug logging in the kafka scaler and found that we have a topic in the consumer group where the lag is like below. The below is output from kafka-consumer-groups.sh...

GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
group1 topic1 0 80 80 0 rdkafka-xxx-xxx-xxx-xxx-xxx /xx.xx.xx.xx rdkafka
group1 topic1 1 - 60 - rdkafka-xxx-xxx-xxx-xxx-xxy /100.100.17.69 rdkafka
group1 topic1 2 - 96 - rdkafka-xxx-xxx-xxx-xxx-xxz /100.100.14.23 rdkafka

Because of this the lag comes as 156.
Here in our case, the topic1 get messages, once in 2-3 months or more. So if we create a new consumer (new group), the lag will be - for some time for all partitions and then suddenly we get one message and lag will be 0 in one partition and will be '-' in other partitions and then the lag for other partitions will start getting counted.
Can you please check?

@rakesh-ism rakesh-ism added the bug Something isn't working label Jun 24, 2024
@rakesh-ism rakesh-ism changed the title Kafka scaler: excludePersistentLag not working for partitions with invalid offset (that is -1) (Refer to #5274 that was closed due to inactivity) Kafka scaler: excludePersistentLag not working for partitions with invalid offset (that is -1) Jun 25, 2024
@zroubalik
Copy link
Member

Duplicate of #5274

@zroubalik zroubalik marked this as a duplicate of #5274 Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants