Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query is only working with "Service" command in Alert page #195

Open
shahriarbro opened this issue Aug 17, 2023 · 4 comments
Open

Query is only working with "Service" command in Alert page #195

shahriarbro opened this issue Aug 17, 2023 · 4 comments
Labels
datasource/X-Ray type/bug Something isn't working

Comments

@shahriarbro
Copy link

It seems the only command that works in Query box in Alert definition is Service command.

What happened:
I'm trying to create an alert which fires every time the "http status 403 error rate" increases.

I have a service called grafana-poc-test-service which is sending its traces to the AWS X-Ray and I can confirm the service("grafana-poc-test-service") query will return the data and can be used in alerts but it seems service is the only key word accepted in the query box. Any simple or complex query other than that will raise a Failed to evaluate the query error. I tested with bellow queries and all of them are raising the same exception:

  • service("grafana-poc-test-service") AND (http.status = 403)
  • http.status = 403
  • fault

This query will work though:

service("grafana-poc-test-service")

How to reproduce it (as minimally and precisely as possible):

  1. Go to "Alert Rules"
  2. Click on 'Create Alert Rules'
  3. Add a query with AWS X-Ray datasource
  4. Choose "Trace statistics" as "Query Type"
  5. Add one of the above queries
  6. Push the "Preview" button
  7. See the error

Screenshots
image

image

image

Working scenario

image

Anything else we need to know?:

Environment:

  • Grafana version: 9.4
  • Plugin version: 2.6.1
  • OS Grafana is installed on: Amazon Managed Grafana
@iwysiu
Copy link
Contributor

iwysiu commented Aug 17, 2023

I was able to reproduce this and from my initial investigation it seems like the AWS api that we're using only supports ID, service, and edge functions. I'll take another look tomorrow and see if there's something else we can use

@kevinwcyu kevinwcyu moved this from Incoming to Waiting in AWS Datasources Aug 22, 2023
@sarahzinger sarahzinger moved this from Waiting to Incoming in AWS Datasources Oct 31, 2023
@sarahzinger
Copy link
Member

Took a look at this one and also am hitting interesting api limitations.

I can use tracelist queries to create a list of filtered queries by status and service like so
Screenshot 2023-11-03 at 4 34 53 PM

But I'm unable to create an alert on them because we don't have a Time field.

I noticed that TraceList is using GetTraceSummaries which as far as I can tell should return to us a matchedTimeEvent field that I would think we could surface through grafana that we could then use as a Time field, but when I add that code in, I see at least with my test data it's always nil. I'm not sure if that's a setting or something I'm misunderstanding.

Will follow up with AWS to see if they have other recommendations on how to query this kind of data. Or if you find other ways of using their api to get traces by service/status and with some kind of time field let us know!

My only other thought as a weird work around would be to use transformations to make some kind of Time Column for this data and since we generally only get back 6 hours worth of traces the value for that time wouldn't necessarily be particularly relevant. But still enough to get alerting working? But this seems a bit of a hack.

@sarahzinger sarahzinger moved this from Incoming to Waiting in AWS Datasources Nov 3, 2023
@sarahzinger
Copy link
Member

Spoke briefly with AWS and their suggestion was that we use GetTraceSummaries rather than GetTimeSeriesServiceStatistics to support querying by service and status code, which is interesting because I thought we supported querying by both (or at least I see references to both calls in our codebase) but I haven't taken a deeper look at it. Seems like something to look deeper into.

Will backlog this issue for now, since we don't have specific timeline just yet on fixing this. But whenever we plan work for our next quarter we'll have a record of this without our findings to make improvements.

@sarahzinger sarahzinger moved this from Waiting to Backlog in AWS Datasources Dec 5, 2023
@davidjungermann
Copy link

I'm also running into this issue, do you have any timeline on when we can expect this to be looked at? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource/X-Ray type/bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

4 participants