Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script for checking running Monasca health #118

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

matrixik
Copy link
Member

Still missing most Monasca services that don't have proper health check endpoints.

Comments welcome.

@matrixik
Copy link
Member Author

To run:

python3 cmm-check-health.py

cmm-check-health.py Outdated Show resolved Hide resolved
Copy link

@mattibf mattibf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue:
In many places, it is written: "...did not returned properly". Pls. correct to "...did not return properly.

cmm-check-health.py Outdated Show resolved Hide resolved
cmm-check-health.py Outdated Show resolved Hide resolved
@mattibf
Copy link

mattibf commented Jan 23, 2020

For those components that can't be accessed directly:
What about checking the status of the services?
E.g., are they up and running healthy, not restarting all the time?

@mattibf
Copy link

mattibf commented Jan 23, 2020

The script shall be used by operators. I.e., a README.md is required.
I suggest to move the script into a sub-directory of tools

Signed-off-by: Dobroslaw Zybort <[email protected]>
Some more fixes to script output.

Signed-off-by: Dobroslaw Zybort <[email protected]>
cmm-check-health.py Outdated Show resolved Hide resolved
cmm-check-health.py Outdated Show resolved Hide resolved
cmm-check-health.py Outdated Show resolved Hide resolved
cmm-check-health.py Outdated Show resolved Hide resolved
cmm-check-health.py Outdated Show resolved Hide resolved
Signed-off-by: Dobroslaw Zybort <[email protected]>
Signed-off-by: Dobroslaw Zybort <[email protected]>
Signed-off-by: Dobroslaw Zybort <[email protected]>
Check this number only if user request for it himself, too many false
positives that would scare operator.

Signed-off-by: Dobroslaw Zybort <[email protected]>
@mattibf
Copy link

mattibf commented Mar 25, 2020

The script currently assumes that docker-compose files are located in a directory two levels above the location of the script. This is certainly a good default, but doesn't fit all situations.
docker-compose files could be located in a different directory.
Can you pls. add a parameter to specify the directory, where docker-compose files are located?
Default would then be as currently implemented.

@mattibf
Copy link

mattibf commented Mar 25, 2020

When executing in an environment that is not OK (not yet tested), I got the following error (see below).
Of course, errors can occur in this untested environment. But the script shouldn't "crash".

Error:
Checking 'Grafana'
Traceback (most recent call last):
File "cmm-check-health.py", line 521, in
print_info("Grafana", test_grafana)
File "cmm-check-health.py", line 69, in print_info
if test_function() is not None:
File "cmm-check-health.py", line 233, in test_grafana
jresp = json.loads(resp)
File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

@matrixik
Copy link
Member Author

The script currently assumes that docker-compose files are located in a directory two levels above the location of the script. This is certainly a good default, but doesn't fit all situations.
docker-compose files could be located in a different directory.
Can you pls. add a parameter to specify the directory, where docker-compose files are located?
Default would then be as currently implemented.

Done

Signed-off-by: Dobroslaw Zybort <[email protected]>
Signed-off-by: Dobroslaw Zybort <[email protected]>
Signed-off-by: Dobroslaw Zybort <[email protected]>
@mattibf
Copy link

mattibf commented Mar 26, 2020

As discussed:

  • Pls. add information in readme, which services are handled, and which are not handled

  • Pls. check if non-default values, e.g. for port numbers, are considered

@mattibf
Copy link

mattibf commented Mar 26, 2020

Pls. add a completion message after completion.
S.th. like:
cmm-health-check successfully checked 10 services without issues @20200326, 12:13:37
cmm-health-check completed check of 10 services, 3 issues found @20200326, 12:13:37
This is just a draft!
I think the message should contain:

  • end timestamp
  • success or not
  • # of services checked
  • # of issues found

@mattibf
Copy link

mattibf commented Mar 26, 2020

Pls. add documentation for "-h" in readme.md

@mattibf
Copy link

mattibf commented Mar 26, 2020

On a system not properly set up, I called:

  • cmm-check-health.py:
    Status for kafka: "log" not found
    Judgement: That's OK - log pipeline not enabled

  • cmm-check-health.py -m:
    Checking 'Kafka'
    Traceback (most recent call last):
    File "cmm-check-health.py", line 598, in
    print_info("Kafka", test_kafka)
    File "cmm-check-health.py", line 75, in print_info
    if test_function() is not None:
    File "cmm-check-health.py", line 440, in test_kafka
    biggest_lag = sorted(lags, reverse=True)[0]
    IndexError: list index out of range

2 topics:

  • exception should be caught
  • Why does it behave differently when called with or without option "-m"?
    I would expect that kafka check would continue, even if topic "log" isn't there

@mattibf
Copy link

mattibf commented Mar 26, 2020

When specifying -l (log pipeline), only the following services are checked:

  • Elasticsearch
  • Elasticsearch curator
  • kibana
  • kafka

However, other central services, like mySql should be checked as well.
What about monasca-log-api? Doesn't seem to be covered at all

@mattibf
Copy link

mattibf commented Mar 26, 2020

Minor error:
When e.g. ".env" doesn't exist in the directoy specified, an error msg is displayed:
"File does not exists: ..."
Pls. change to "File does not exist: ..."

@mattibf
Copy link

mattibf commented Mar 26, 2020

One more minor issue:
called in ST environment:

  • /cmm_2012 contains docker-compose files and .env file
  • script is located in /cmm_2012/tools/check-health and called from there

First call: python cmm-check-health.py -f "../.."
Output:

Running simple tests of running Monasca services
Local time 2020-03-26 15:30:44
UTC time 2020-03-26 14:30:44
docker-compose version 1.15.0, build e12f3b9

Looking for .env and configuration files in: ../..
Checking 'Memcached'
.IOError: [Errno 2] No such file or directory: u'./../../docker-compose-metric.yml'

Command '['docker-compose', '--project-directory', '../..', '--file', '../../docker-compose-metric.yml', '--file', '../../docker-compose-log.yml', 'exec', 'memcached', 'ash', '-c', 'echo stats | nc -w 1 127.0.0.1 11211']' returned non-zero exit status 1

�[91m❌�[0m There is problem with Memcached

Same error is reported for all services.

With 2nd call:
python cmm-check-health.py -f "../../../cmm_2012"
everything works as expected.

@matrixik
Copy link
Member Author

When specifying -l (log pipeline), only the following services are checked:

* Elasticsearch

* Elasticsearch curator

* kibana

* kafka

However, other central services, like mySql should be checked as well.
What about monasca-log-api? Doesn't seem to be covered at all

MySql is not used by log pipeline

Not sure how I omitted monasca-log-api, looking into this.

Signed-off-by: Dobroslaw Zybort <[email protected]>
Signed-off-by: Dobroslaw Zybort <[email protected]>
Signed-off-by: Dobroslaw Zybort <[email protected]>
@matrixik
Copy link
Member Author

OK, I think that beside completion message everything else is addressed.

@matrixik
Copy link
Member Author

Oh, and checking events for restarts is still not enabled again.

@matrixik
Copy link
Member Author

Checking docker events for restarts enabled again with improved messaging for operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants