Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerfCi Grafana Dashboard: add grafonnet support #641

Merged
merged 1 commit into from
Aug 28, 2023

Conversation

ebattat
Copy link
Collaborator

@ebattat ebattat commented Aug 7, 2023

Type of change

Note: Fill x in []

  • bug
  • enhancement
  • documentation
  • dependencies

Description

Adding grafonnet support:

  1. Dockerfile: using terraform for generating dashboard.json from grafonnet (main.libsonnet)
  2. Update Grafana dashboard using generated dashboard.json
  3. Updating automatically new product versions from ElasticSearch and update main.libsonnet
  4. Update Build and Nightly workflows with update_grafana_dashboard job that update Grafana dashboard
  5. Add test for fetching all dashboards from Grafana for testing connectivity.
  6. jsonnet/jsonnetfile.lock.json: This file configure specific grafonnet version and not with latest, that will reduce grafonnet generator failures.

[ Important: Currently the implementation is against testing dashboard, once all working update PerfCi dashboard ]

For security reasons, all pull requests need to be approved first before running any automated CI

@ebattat ebattat added the enhancement New feature or request label Aug 7, 2023
@ebattat ebattat self-assigned this Aug 7, 2023
@ebattat ebattat changed the title add grafonnet support PerfCi Grafana Dashboard: add grafonnet support Aug 7, 2023
@ebattat ebattat added the ok-to-test PR ok to test label Aug 7, 2023
Copy link
Member

@RobertKrawitz RobertKrawitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want Thiru and/or Kelly to review this too.

.github/workflows/Nightly_Perf_Env_CI.yml Outdated Show resolved Hide resolved
.github/workflows/Nightly_Perf_Env_CI.yml Outdated Show resolved Hide resolved
.github/workflows/Nightly_Perf_Env_CI.yml Outdated Show resolved Hide resolved
.github/workflows/Nightly_Perf_Env_CI.yml Outdated Show resolved Hide resolved
.github/workflows/Nightly_Perf_Env_CI.yml Outdated Show resolved Hide resolved
benchmark_runner/grafana/perf/jsonnet/jsonnetfile.json Outdated Show resolved Hide resolved
benchmark_runner/grafana/update_grafana_dashboard.py Outdated Show resolved Hide resolved
Copy link
Member

@RobertKrawitz RobertKrawitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a number of changes as specified.

@ebattat ebattat force-pushed the add_grafonnet_support branch 2 times, most recently from 90079f5 to 4be1b8d Compare August 7, 2023 18:03
@ebattat ebattat force-pushed the add_grafonnet_support branch 3 times, most recently from 6e0b783 to 8adf676 Compare August 7, 2023 19:06
Copy link
Member

@RobertKrawitz RobertKrawitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have some comments to address.

.github/workflows/Nightly_Perf_Env_CI.yml Outdated Show resolved Hide resolved
# Pull the latest changes from the remote main branch
git pull origin main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment still applies.

.github/workflows/Perf_Env_Build_Test_CI.yml Outdated Show resolved Hide resolved

class GrafanaOperations:
"""
This class responsible for Grafana operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment still applies.

:return:
"""
dashboard_list = []
headers = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment still applies.

benchmark_runner/grafana/perf/jsonnet/g.libsonnet Outdated Show resolved Hide resolved
Copy link
Member

@RobertKrawitz RobertKrawitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more quoting changes and one Content-Type:

.github/workflows/Nightly_Perf_Env_CI.yml Outdated Show resolved Hide resolved
:return:
"""
dashboard_list = []
headers = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the http request have a Content-Type: header since you expect JSON?

import re
import argparse

parser = argparse.ArgumentParser(description="Extract text between <<EOT and EOT in input file.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if EOT is somehow present inside a line as opposed to being part of a here document.

@ebattat ebattat force-pushed the add_grafonnet_support branch 2 times, most recently from 7d69b94 to 20ac6a6 Compare August 8, 2023 14:16
@ebattat
Copy link
Collaborator Author

ebattat commented Aug 8, 2023

@RobertKrawitz, The parser works fine and parse it without EOT

@RobertKrawitz
Copy link
Member

@RobertKrawitz, The parser works fine and parse it without EOT

Just because it works in a test doesn't mean that it's robust. You need to handle edge cases and unexpected input too.

@ebattat
Copy link
Collaborator Author

ebattat commented Aug 8, 2023

@RobertKrawitz here is an example for the code we extract:

Before extracting:

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

dashboard = <<EOT
{
   "description": "Dashboard for Faro",
   "graphTooltip": 1,
   "panels": [
      {
         "datasource": {
            "type": "datasource",
            "uid": "-- Mixed --"
         },
         "fieldConfig": {
            "defaults": {
               "unit": "reqps"
            }
         },
         "gridPos": {
            "h": 8,
            "w": 24
         },
         "id": 1,
         "targets": [
            {
               "datasource": {
                  "type": "prometheus",
                  "uid": "ops-cortex"
               },
               "expr": "sum by (status_code) (rate(request_duration_seconds_count{job=~\".*/faro-api\"}[$__rate_interval]))"
            }
         ],
         "title": "Requests / sec",
         "type": "timeseries"
      }
   ],
   "schemaVersion": 36,
   "time": {
      "from": "now-6h",
      "to": "now"
   },
   "timezone": "utc",
   "title": "Faro dashboard"
}

EOT

After extracting:

{
   "description": "Dashboard for Faro",
   "graphTooltip": 1,
   "panels": [
      {
         "datasource": {
            "type": "datasource",
            "uid": "-- Mixed --"
         },
         "fieldConfig": {
            "defaults": {
               "unit": "reqps"
            }
         },
         "gridPos": {
            "h": 8,
            "w": 24
         },
         "id": 1,
         "targets": [
            {
               "datasource": {
                  "type": "prometheus",
                  "uid": "ops-cortex"
               },
               "expr": "sum by (status_code) (rate(request_duration_seconds_count{job=~\".*/faro-api\"}[$__rate_interval]))"
            }
         ],
         "title": "Requests / sec",
         "type": "timeseries"
      }
   ],
   "schemaVersion": 36,
   "time": {
      "from": "now-6h",
      "to": "now"
   },
   "timezone": "utc",
   "title": "Faro dashboard"
}

@RobertKrawitz
Copy link
Member

What happens if the original is:

{
   "description": "Dashboard for Faro",
   "graphTooltip": 1,
   "panels": [
      {
         "datasource": {
            "type": "datasource",
            "uid": "-- Mixed --"
         },
         "fieldConfig": {
            "defaults": {
               "unit": "reqps"
            }
         },
         "gridPos": {
            "h": 8,
            "w": 24
         },
         "id": 1,
         "targets": [
            {
               "datasource": {
                  "type": "prometheus",
                  "uid": "EOT-cortex"
               },
               "expr": "sum by (status_code) (rate(request_duration_seconds_count{job=~\".*/faro-api\"}[$__rate_interval]))"
            }
         ],
         "title": "Requests / sec",
         "type": "timeseries"
      }
   ],
   "schemaVersion": 36,
   "time": {
      "from": "now-6h",
      "to": "now"
   },
   "timezone": "utc",
   "title": "Faro dashboard"
}

You'll then get

{
   "description": "Dashboard for Faro",
   "graphTooltip": 1,
   "panels": [
      {
         "datasource": {
            "type": "datasource",
            "uid": "-- Mixed --"
         },
         "fieldConfig": {
            "defaults": {
               "unit": "reqps"
            }
         },
         "gridPos": {
            "h": 8,
            "w": 24
         },
         "id": 1,
         "targets": [
            {
               "datasource": {
                  "type": "prometheus",
                  "uid": "

as your output.

I know this is unlikely, but the code needs to be written defensively.

@ebattat
Copy link
Collaborator Author

ebattat commented Aug 8, 2023

@RobertKrawitz. so what do you recommend ?

Copy link
Member

@RobertKrawitz RobertKrawitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more quotes, and still need to fix the EOT regex.

ELASTICSEARCH_USER: ${{ secrets.PERF_ELASTICSEARCH_USER }}
ELASTICSEARCH_PASSWORD: ${{ secrets.PERF_ELASTICSEARCH_PASSWORD }}
run: |
python $GITHUB_WORKSPACE/grafonnet_generator/grafana/update_versions_main_libsonnet.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be quoted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

.github/workflows/Perf_Env_Build_Test_CI.yml Outdated Show resolved Hide resolved
@ebattat ebattat force-pushed the add_grafonnet_support branch 2 times, most recently from c22c934 to 6350059 Compare August 9, 2023 08:39
@ebattat
Copy link
Collaborator Author

ebattat commented Aug 9, 2023

@RobertKrawitz

  1. Http content type:
    The Content-Type header is typically used when sending data in a request (e.g., in a POST or PUT request) to specify the format of the data being sent. Since I am only fetching data using a GET request and not sending any data, I don't need to include the Content-Type header in this case.
    You can see that I use content type json={"dashboard": self.dashboard_data} in override_dashboard() method below

  2. EOT
    I have added dictionary validation by load it to json, meaning if we have the following example, we will get the following error: Extracted content is not valid JSON.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

dashboard = <<EOT
{
   "description": "Dashboard for Faro",
   "graphTooltip": "EOT",
   "title": "Faro dashboard"
}

EOT
"""

@RobertKrawitz
Copy link
Member

@RobertKrawitz.

The only way to handle it is to inject a special string instead of EOT, e.g. @@EOT@@. We are using existing terraform jsonnet parser code that generate auto the following output syntax:

dashboard = <<EOT
{
   "description": "Dashboard for Faro",
   "graphTooltip": "aaa",
   "title": "Faro dashboard"
}

EOT

So the current solution is the best we can do, if there will be any parsing issue, exception will be raised.

No, the regular expression can be more intelligent, to only recognize the second EOT on a line by itself. Rather than

r'dashboard = <<EOT(.*?)EOT'

it can be something like

r`dashboard = <<EOT(.*\n)EOT($|\n)'

to pick up the final EOT on a line by itself, either at the very end or between two newlines.

@RobertKrawitz
Copy link
Member

The other way to do it is to pass it to a shell to parse, but then you have to trust that the code isn't going to do something it shouldn't. Parsing the here doc is reasonable enough, just do it correctly.


ENV PATH="/root/go/bin:${PATH}"

CMD ["/bin/sh", "-c", "terraform init; make > dashboard.txt; python3 extract_dashboard.py dashboard.txt dashboard.json; terraform destroy -auto-approve > /dev/null 2>&1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a much easier way you can do this:

CMD ["/bin/sh", "-c", "terraform init; make |/bin/sh > dashboard.json; terraform destroy -auto-approve > /dev/null 2>&1"]

and then you won't need extract_dashboard.py at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried, but its not working:

[root@6e8e49594c69 app]# make |/bin/sh > dashboard.json
/bin/sh: line 1: $'\E[0m\E[1mdata.jsonnet_file.dashboard:': command not found

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the extract code for suffix: \nEOT\n
r'dashboard = <<EOT(.*?)\nEOT\n'
That should solve the EOT issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to run terraform init --no-color.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://developer.hashicorp.com/terraform/cli/commands/init

-no-color Disable color codes in the command output.

You're much better off not trying to fight what terraform init is trying to do here.

Copy link
Collaborator Author

@ebattat ebattat Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

terraform apply -auto-approve -json => generates corrupted json output
Generates a different json format

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/root/grafonnet/ => You should put your local path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local path to what?

Can you include the generated JSON?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dashboard.json
I have pushed the changes, think its good enough for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix it up so that extract_dashboard.py doesn't need an input and an output file, and you don't have to create a temporary file in the makefile. Yes, this will work, but it's one more place that something might go wrong in the future.

@ebattat ebattat force-pushed the add_grafonnet_support branch 3 times, most recently from 11fb3a9 to 3ec88b4 Compare August 21, 2023 14:36
import re, sys, json

if len(sys.argv) != 3:
raise ValueError('Usage: python extract_dictionary.py input_file output_file')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called extract_dashboard. But you really should have it read from stdin and write to stdout, without any temporary files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a few minutes to write this correctly to not need input and output files. It's very little work and it's one less piece of technical debt that builds up.

Copy link
Collaborator Author

@ebattat ebattat Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RobertKrawitz You right, remove temporary file, read directly from stdin

terraform apply -auto-approve | python3 extract_dashboard.py dashboard.json

@ebattat ebattat force-pushed the add_grafonnet_support branch 2 times, most recently from 277e1e2 to 413fbf2 Compare August 22, 2023 10:25
import re, sys, json

if len(sys.argv) != 3:
raise ValueError('Usage: python extract_dictionary.py input_file output_file')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a few minutes to write this correctly to not need input and output files. It's very little work and it's one less piece of technical debt that builds up.


ENV PATH="/root/go/bin:${PATH}"

CMD ["/bin/sh", "-c", "terraform init; make > dashboard.txt; python3 extract_dashboard.py dashboard.txt dashboard.json; terraform destroy -auto-approve > /dev/null 2>&1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix it up so that extract_dashboard.py doesn't need an input and an output file, and you don't have to create a temporary file in the makefile. Yes, this will work, but it's one more place that something might go wrong in the future.

@cd jsonnet/ && \
jb install && \
cd ../ && \
terraform apply -auto-approve | python3 extract_dashboard.py dashboard.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best for extract_dashboard.py to simply send its output to stdout (so you could easily e. g. pipe it through jq if desired), but I'll accept this, as it avoids the temporary file.

Copy link
Collaborator Author

@ebattat ebattat Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RobertKrawitz, no temporary file, pls see latest fix.
terraform apply -auto-approve | python3 extract_dashboard.py dashboard.json

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I approved it as such. I still think that extract_dashboard should write to stdout rather than to a separate file; that would allow you to e. g. process it further through jq, but I've accepted it as it avoids the temporary file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to dashboard.json file?
We must have this file because we use it to update Grafana dashboard in case of changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but you could write it as

terraform apply -auto-approve | python3 extract_dashboard.py > dashboard.json

Then if you want at some point to process the JSON with jq downstream of extract_dashboard.py you can simply add it to the pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, updated it !!!

"""
This class updates grafana dashboard with last value mappings from ElasticSearch
"""
MAX_VERSION_LEN = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for (please document it).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RobertKrawitz, add document.
Prevent error messages from being displayed instead of versions in case of a connection issue

@ebattat ebattat force-pushed the add_grafonnet_support branch 2 times, most recently from 25e6c44 to 2bdfe38 Compare August 24, 2023 08:55
@ebattat ebattat force-pushed the add_grafonnet_support branch 3 times, most recently from bea25ee to 216b7bf Compare August 27, 2023 09:36
@ebattat
Copy link
Collaborator Author

ebattat commented Aug 27, 2023

@RobertKrawitz,

I have added jsonnetfile.lock.json file to point on specific grafonnet version instead of latest.
That should reduce failure while using grafonnet generator terraform in case of grafonnet code changes.
I have added it due to recent grafonnet code fix that cause a failure when generating dashboard.json.
BTW, I have fixed it in latest main.libsonnet [+ stateTimeline.withPluginVersion() - by removing '8.4.0-pre' parameter]
WDYT ?

Copy link
Member

@RobertKrawitz RobertKrawitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this is required by jsonnet, please change the version file name from jsonnetfile.lock.json to jsonnetfile.version.json or similar. lock implies concurrency control of some kind, which isn't what this is about.

@ebattat ebattat merged commit b3cf076 into redhat-performance:main Aug 28, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ok-to-test PR ok to test
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants