-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
influxdb2 does not send notifications #18769
Comments
Same issue. Check works well, i can see the row in the check's history but I don't see the correlated row in notification history. |
Same issue.
|
The very same issue here. 😞 # influxd2 version
InfluxDB 2.0.0-beta.13 (git: 86796ddf2d) build_date: 2020-07-08T11:07:50Z I don't see anything in the notification tab (but checks seem to work). It also seems that alerts are beying scheduled somehow (at least the last running time seems to get updated regularly) and perhaps even run (?) but nothing really happens. I am trying the HTTP PUSH method, just to clarify it more. And of course, nothing special gets logged... 😢 How to debug it to know more, please? 😁 |
Any feedback from influx team? |
@omers No. |
I had encountered the same problem during the past week. The notification rules seem to work only for checks created before the rules was created. Newly made checks don't fire notifications. To "fix" it, you have to do a PUT (PATCH doesn't work) request on the rule, with the same data, effectivly overriding it with itself and that seems to refresh the list of checks that the rule is looking for. |
+1 |
I see the same error on my end as I've reported here https://community.influxdata.com/t/notification-when-status-changes-from-ok-crit/13847/9
|
+1 we're running version=2.0.0-beta.15, statuses are being written by checks, the statuses are changing, but notifications using the 'changes from' type are not firing. notifications for 'is equal to' work as expected. i tried 'reputting' the rule as per @Pupix 's suggestion (via the UI), makes no difference. it seems this is the exact same problem as: #17809 seems to me the issue should be reopened... -ivan |
adding to my previous comment, after playing with different intervals for the notification rule and check interval i believe that this is related to: indeed setting the notification rule to ~1m9s, with a check interval of 30s, ensures that there is 'always' (almost) a check that has run in the notification window, and i am receiving state change notifs (via http). this setup seems fragile and i'm not sure we're able to rely on it for production use however... will continue experimenting. |
following on from this, we have now modified our config so that the checks run every 10s, and the notification every 30s, but there are still notifications being missed. executing the following:
results in the 'correct' list of state changes, but not all of these changes result in a notification. is there a way to have more debug information for the notification sending, to try to understand where the ones that are not being sent are failing ? |
@ivanpricewaycom I couldn't find any option to debug alerts and rules. |
I've observed the same issue in my setup with 'changes from' notification rule. For 'is equal to' notifications are sent to the http endpoint (however, interestingly I am always receiving the same event 3 times with the same timestamp). I am still trying to find out if workaround @ivanpricewaycom works. Unfortunately, without much successes until now :(. I am using beta.16 version. |
@mhall119 Any ETA on fixing this issue? I have confirmed and the alerts and notifications are working on 2.0.0-beta.9 branch. Is there a way that I can build a docker image for that tag? |
@abhi1693 right now all efforts are on finishing the storage engine change, after than I think there is a new version of Flux that's ready to be added which might have a fix for this. |
@mhall119 Thanks for replying so soon. Is there a timeline on this? |
The storage engine change is currently in the works, I think that's supposed to land in the next week or so. After than I'm not as sure on the schedule, bug you can ask in our Slack in the #influxdb-v2 channel |
I am able to confirm the following in the current beta-16 build:
We will have to investigate what is going on. |
we merged a fix for this recently: #19392 Summary: when a check's data is perfectly aligned on a boundary with the notification rules' schedule, we had a bug that trimmed off both the starting and ending points of the alerting time range. The fix makes sure that one side of the rule is always accepted, thus assuring that no rows are missed. Unfortunately, it requires opening/saving each notification rule in order to regenerate the correct code. We are looking into migration solutions for users with a large number of notification rules. |
ok great news, as soon as beta.17 (docker image) is released we'll be testing this. |
I was receiving alerts via slack in beta 16
Sep 1, 2020 04:39:43 ivanpricewaycom <[email protected]>:
… ok great news, as soon as beta.17 (docker image) is released we'll be testing this.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub[#18769 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ADFKHBEZPMGOJKLLOAEHNJDSDS6MZANCNFSM4OKME4UA]. [https://github.com/notifications/beacon/ADFKHBEDQ2NBY3FGQK5L5SDSDS6MZA5CNFSM4OKME4UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFDHUAJA.gif]
|
@aanthony1243 Any ETA on closing this bug and releasing another beta version? |
This big has been fixed in the latest beta I have been able to receive alerts via pager duty and slack |
@tiny6996 There hasn't been a release since Aug 8 for v2. Can you please confirm which version are you referring to? |
Given that the fix that @aanthony1243 refers to was merged 24/8 it is clearly not included in the beta 16, @tiny6996 must be referring to a different problem.. I'm still waiting for beta 17 to see if the fix addresses the problem we're experiencing. |
yeah sorry i don't have any other suggestions for you @pavleec , what we need here is better debug visibility on the task / notif system as a whole to understand where the blockages are. a sandbox environment on a publicly-available influx instance would be useful also to help share the problem with influx devs. |
After some more debugging the issue seemed to be that we had configured a notification rule that triggered upon:
and
Where filling empty results only works with |
I believe this influxdata/flux#1877 is related to this. |
Hi, this is still a problem. I have a few rules that rely on the status change as reporting when a status equals |
|
Just upgraded from 1.x to |
same behavior with 2.0.6: |
Well this has become a real blocker. I think that version 2.0 is not ready at all for production usage. Anyone can advise? |
I've been trying for the past two hours trying different combinations, and alerts are just broken. I can get equals conditions to fire (e.g state = CRIT), but any "change" condition (e.g ANY to CRIT) just does not want to send a notification. This means that the alerting is basically useless as you need another interim system in-between to filter notifications that have already been sent. |
Totally useless.
Their managed cloud has the same problem and therefore it is useless too.
Rolled back to 1.8
…On Mon, 19 Jul 2021 at 7:51 Cam Murray ***@***.***> wrote:
I've been trying for the past two hours trying different combinations, and
alerts are just broken.
I can get equals conditions to fire (e.g state = CRIT), but any "change"
condition (e.g ANY to CRIT) just does not want to send a notification. This
means that the alerting is basically useless as you need another interim
system in-between to filter notifications that have already been sent.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18769 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEUB7DSDT6SKLGMFGVF4ZDTYOVORANCNFSM4OKME4UA>
.
|
hi all, thank you for your comments. we are very aware that our checks and alerting UI needs some improvement, and we are in the process of making those changes. if you haven't already, check out this blog post for a detailed description of what's going on behind the scenes: https://www.influxdata.com/blog/influxdbs-checks-and-notifications-system/ long story short, alerts are just customized tasks behind the scenes, and you can customize them however you like. Today, our UI is limited in what you can build, but that should be changing soon. we have documentation for building custom alerting as well which can also help troubleshoot alerts not firing: https://docs.influxdata.com/influxdb/cloud/monitor-alert/custom-checks/ i understand the frustration with the current setup and we are taking steps to make the process easier. thank you! |
Hi contributors, and guys here who bothered by this problem, It's surprised to find the problem I've encountered have a >1 year age. After digging in Firstly, my conclusion is: when the interval of a check >= of a notification rule, its status transition might be discarded. After reading the comment above But when it comes to same interval (the = in >=), things become tricky. Consider we have a check and a rule, both with 5s interval. We will have status at 0 / 5 / 10 / 15 ...s. In this case, the rule will be firing with the check almost simultaneously. If at 10s, the rule query the db before the check is written, the system lose a point at 10s. But will it get both 0s and 5s? After look into It have been several hours investigating weird behaviors of the monitor system for me to come up with ideas. This is quite a sound hypothesis in a few tests, but it's late in my local, I might just get myself into chaos. I'm not familiar with Go, my apologies for your time wasted if I misunderstood the code. Thanks you all. <3 |
@TechCiel Please take a look at influxdata/flux#3807 where I'm proposing a patch for the issue. If you could test that out on your side, it would be useful feedback. |
any updates on this? |
@lukasvida influxdata/flux#3807 was fixed in v2.0.9 so I believe that should resolve some of the problems observed here |
I have problem with notifications Whenever status changes to I'm using two notification rules, one is EDIT: I'm using version |
I'm getting pretty much the same issue with 2.5, I can see states changing, but notifications are rarely sent out. On a test alert that is firing every couple of minutes, the history shows the last notification event over an hour ago. This is a new install, only one event configured. |
This BUG is 2 years old already. |
@omers I don't think there is going to be a fix soon, unfortunately. Seems like this could be a serious design flaw related to locking/blocking which could be causing this problem. That is why they've got offset pretty much everywhere - to avoid locking. This is unreliable at best, and random-level success at worst. We've got 100 Telegraf agents logging data in the same database and so far I can't seem to find a way to make global alerts work. If you specify data from a single host, then it mostly works. But as soon as you try to query larger datasets, it just doesn't work. For example, I've set CPU check with 100VMs logging every 30 seconds, then run a benchmark on one of the VMs. I can clearly see on the check graph that the CPU for the particular VM is at 99-100% CPU, yet there is no critical status on the log (critical is configured to be >95%). In another instance, I get the status to change, but no alert is issued. Or I get the alert to fire, but no actual data is sent to the endpoint. |
This was the cause of my problems. After I changed the status rule to |
how many clients do you have logging data there? |
About 5 at any given time. |
I'm experiencing this issue has any one got it working, please share the fix. Refer my open issue here #24319 |
Notifications sometimes work for me, sometimes they don’t, it’s a nightmare ((( |
This will never been fixed at all. Await Version 3 and pay an totally unreal subscription fee and it might work. v2 remains as an experiment. |
I setup influxdb2 to send alerts to our slack channel.
I see only alerts that were sent 4 days ago and no alerts received to slack channel
for debugging, I created a flask app to listen as an http endpoint, and created an http notification rule to route all the alerts to the flask app. I see no requests
Steps to reproduce:
Expected behavior:
Alerts will be delievered
Actual behavior:
Describe What actually happened.
Environment info:
Config:
/usr/sbin/influxd run --engine-path /influx/engine --bolt-path /influx/boltdb.db --http-bind-address 127.0.0.1:9999 --log-level info
The last time alerts were sent was 4 days ago:
The text was updated successfully, but these errors were encountered: