Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Alarm for notification queue overpassing a given threshold (#4113) #4375

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Anjali-NEC
Copy link
Contributor

@Anjali-NEC Anjali-NEC commented Jun 14, 2023

Fix issue #4113

src/lib/common/limits.h Outdated Show resolved Hide resolved
CHANGES_NEXT_RELEASE Outdated Show resolved Hide resolved
@fgalan
Copy link
Member

fgalan commented Jul 10, 2023

I have provided some extra comments that I hope may help.

In addition note that for a PR to be ready for merging, all the tests should pass (at the present moment, they don't pass).

@Anjali-NEC
Copy link
Contributor Author

In addition note that for a PR to be ready for merging, all the tests should pass (at the present moment, they don't pass).

@fgalan All test cases are passed except 4113_alarm_for_notification_queue_overpassing_a_given_threshold/alarm_for_notification_queue_overpassing_a_given_threshold.test but this test case is passed on my local environment

CHANGES_NEXT_RELEASE Outdated Show resolved Hide resolved
doc/manuals/admin/logs.md Outdated Show resolved Hide resolved
@fgalan
Copy link
Member

fgalan commented Sep 6, 2023

The massive fails in CI are due to the changes done in docker CI image (see #4417 (comment)). Once PR #4417, this PR should be updated with master and test will be passing again.

@fgalan
Copy link
Member

fgalan commented Sep 6, 2023

The massive fails in CI are due to the changes done in docker CI image (see #4417 (comment)). Once PR #4417, this PR should be updated with master and test will be passing again.

PR #4417 has been merged. @Anjali-NEC please upgrade this PR's branch with master.

Comment on lines +194 to +202
std::string details = ("notification queue reached maximum threshold");

long unsigned int threshold = queueSize(service)*notifAlarmThreshold/100;

if (threshold >= queueSize(service))
{
alarmMgr.notificationQueue(queueName.c_str(), details.c_str());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code can be simplified this way:

Suggested change
std::string details = ("notification queue reached maximum threshold");
long unsigned int threshold = queueSize(service)*notifAlarmThreshold/100;
if (threshold >= queueSize(service))
{
alarmMgr.notificationQueue(queueName.c_str(), details.c_str());
}
}
long unsigned int threshold = queueSize(service)*notifAlarmThreshold/100;
if (threshold >= queueSize(service))
{
alarmMgr.notificationQueue(queueName.c_str(), "notification queue reached maximum threshold");
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, the detail message could provide information about the particular threshold for this case. For instance, if we have a queue of size 6 and the threshold is 50%, something like this:

notification queue reached maximum threshold (3)

# VALGRIND_READY - to mark the test ready for valgrindTestSuite.sh

--NAME--
alarm for notification queue overpassing a given threshold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alarm for notification queue overpassing a given threshold
alarm for notification queue overpassing a given threshold (relog variant)

Comment on lines +228 to +233
Raising alarm NotificationQueue serv1: notification queue reached maximum threshold
Repeated NotificationQueue serv1: notification queue reached maximum threshold
Raising alarm NotificationQueue serv2: notification queue reached maximum threshold
Repeated NotificationQueue serv2: notification queue reached maximum threshold
Raising alarm NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this result doesn't follow expectations.

According to:

# 04. Create/update entity in serv1 5 times (update #3 raises alarm, update #4 and #5 cause repeated log)
# 05. Create/update entity in serv2 3 times (update #2 raises alarm, update #3 cause repeated log)
# 06. Create/update entity in serv3 (default) 7 times (update #4 raises alarm, updates #5, #6 and #7 cause repeated log)

Se should see 2 repeated logs for serv1, 1 repeated log for serv2 (that's is ok in the above output) and 3 repeated logs for default queue. Something like this:

Raising alarm NotificationQueue serv1: notification queue reached maximum threshold
Repeated NotificationQueue serv1: notification queue reached maximum threshold
Repeated NotificationQueue serv1: notification queue reached maximum threshold
Raising alarm NotificationQueue serv2: notification queue reached maximum threshold
Repeated NotificationQueue serv2: notification queue reached maximum threshold
Raising alarm NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold

@fgalan
Copy link
Member

fgalan commented Sep 7, 2023

In addition to the current two .test files (which are nice :), I'd suggest to add one to check the releasing of the new alarm. Something like this:

# 01. Subscribe serv1 to the accumulator endpoint that responses in 10 seconds
# 02. Subscribe serv2 to the accumulator endpoint that responses in 10 seconds
# 03. Subscribe serv3 to the accumulator endpoint that responses in 10 seconds
# 04. Create/update entity in serv1 5 times (update #3 raises alarm)
# 05. Create/update entity in serv2 3 times (update #2 raises alarm)
# 06. Create/update entity in serv3 (default) 7 times (update #4 raises alarm)
# 07. Wait 11 seconds
# 08. Grep log for notificationQueue alarm

The endpoint that responses in 10 seconds is /noresponse (can be seen here https://github.com/telefonicaid/fiware-orion/blob/master/scripts/accumulator-server.py#L233)

As result of step 08, we should see some releasing alarm messages.

@fgalan
Copy link
Member

fgalan commented Dec 19, 2023

After the merging of PR #4332 this PR needs to be upgrades with master branch. Otherwise, functional test will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants