Skip to content

Commit

Permalink
Check swarm (#23)
Browse files Browse the repository at this point in the history
Add swarm and service commands
Updated documentation
Added check_swarm to various test configurations
  • Loading branch information
timdaman authored Oct 29, 2017
1 parent 1ae45a9 commit 5069b9e
Show file tree
Hide file tree
Showing 7 changed files with 110 additions and 30 deletions.
3 changes: 2 additions & 1 deletion .codeclimate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ languages:
PHP: true
Python: true
exclude_paths:
- "test_check_docker.py"
- "test_check_docker.py"
- "test_check_swarm.py"
4 changes: 3 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,7 @@ install:
- pip install codeclimate-test-reporter coverage==4.3.4 pyfakefs
# command to run tests
script:
- coverage run ./test_check_docker.py
- COVERAGE_FILE=.coverage.check_docker coverage run ./test_check_docker.py
- COVERAGE_FILE=.coverage.check_swarm coverage run ./test_check_swarm.py
- coverage combine .coverage.check_*
- codeclimate-test-reporter || echo "Ignoring Code Climate reporter upload failure"
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
include license.txt
include README.txt
include check_docker
include check_swarm
include setup.py
66 changes: 51 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,32 @@
[![Code Climate](https://codeclimate.com/github/timdaman/check_docker/badges/gpa.svg)](https://codeclimate.com/github/timdaman/check_docker)
[![Test Coverage](https://codeclimate.com/github/timdaman/check_docker/badges/coverage.svg)](https://codeclimate.com/github/timdaman/check_docker/coverage)
# check_docker
This a a nagios/NRPE compatible plugin for checking docker containers. So far you can use it to check

- memory consumption in absolute units (bytes, kb, mb, gb) and as a percentage (0-100%)
of the container limit.
Nagios/NRPE compatible plugins for checking docker based services. Currently there are two nagios checks

- check_docker which checks docker container health
- check_swarm which checks health of swarm nodes and services

With check_docker can use it to check and alert on

- memory consumption in absolute units (bytes, kb, mb, gb) and as a percentage (0-100%) of the container limit.
- CPU usages as a percentage (0-100%) of container limit.
- automatic restarts performed by the docker daemon
- container status, i.e. is it running?
- container health checks are passing?
- uptime, i.e. is it able to stay running for a long enough time?
- the presence of a container or containers matching specified names
- image version (experimental!), does the running image match that in
the remote registry?
- image version (experimental!), does the running image match that in the remote registry?

With check_swarm you can alert

This check can communicate with a local docker daemon socket file (default) or with local
- if a node is not joined to a docker swarm
- if a service is running in a swarm

These checks can communicate with a local docker daemon socket file (default) or with local
or remote docker daemons using secure and non-secure TCP connections.

This plugin requires python 3. It is tested on 3.3 and greater but may work on older
These plugins require python 3. It is tested on 3.3 and greater but may work on older
versions of 3.

## Installation
Expand All @@ -33,15 +41,17 @@ With pip
With curl

curl -o /usr/local/bin/check_docker https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker
chmod a+rx /usr/local/bin/check_docker
curl -o /usr/local/bin/check_swarm https://raw.githubusercontent.com/timdaman/check_docker/master/check_swarm
chmod a+rx /usr/local/bin/check_docker /usr/local/bin/check_swarm

With wget

wget -O /usr/local/bin/check_docker https://raw.githubusercontent.com/timdaman/check_docker/master/check_docker
chmod a+rx /usr/local/bin/check_docker
wget -O /usr/local/bin/check_swarm https://raw.githubusercontent.com/timdaman/check_docker/master/check_swarm
chmod a+rx /usr/local/bin/check_docker /usr/local/bin/check_swarm


## Usage
## check_docker Usage

usage: check_docker [-h]
[--connection [/<path to>/docker.socket|<ip/host address>:<port>]
Expand Down Expand Up @@ -83,9 +93,35 @@ With wget
images. Only works with public registry.
--restarts WARN:CRIT Container restart thresholds.

Gotchas:
## check_swarm Usage

* When using this with older versions of docker (I have seen 1.4 and
1.5) –status only supports ‘running’, ‘restarting’, and ‘paused’.
* When no container is specified all containers are checked. Some containers will return critcal status because the
selected check(s) require a running container.
usage: check_swarm [-h]
[--connection [/<path to>/docker.socket|<ip/host address>:<port>]
| --secure-connection [<ip/host address>:<port>]]
[--timeout TIMEOUT]
(--swarm | --service SERVICE [SERVICE ...])

Check docker swarm.

optional arguments:
-h, --help show this help message and exit
--connection [/<path to>/docker.socket|<ip/host address>:<port>]
Where to find docker daemon socket. (default:
/var/run/docker.sock)
--secure-connection [<ip/host address>:<port>]
Where to find TLS protected docker daemon socket.
--timeout TIMEOUT Connection timeout in seconds. (default: 10.0)
--swarm Check swarm status
--service SERVICE [SERVICE ...]
One or more RegEx that match the names of the
services(s) to check.
usage: check_swarm [-h]
[--connection [/<path to>/docker.socket|<ip/host address>:<port>]
| --secure-connection [<ip/host address>:<port>]]
[--timeout TIMEOUT]
(--swarm | --service SERVICE [SERVICE ...])

Gotchas:

* When using check_docker with older versions of docker (I have seen 1.4 and 1.5) –status only supports ‘running’, ‘restarting’, and ‘paused’.
* When using check_docker, if no container is specified, all containers are checked. Some containers may return critcal status if the selected check(s) require a running container.
56 changes: 48 additions & 8 deletions README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@
check_docker
============

This a nagios/NRPE compatible plugin for checking docker containers. So far you
can use it to check and alert on
Nagios/NRPE compatible plugins for checking docker based services. Currently there are two nagios checks

- `check_docker` which checks docker container health
- `check_swarm` which checks health of swarm nodes and services

With check_docker can use it to check and alert on

- memory consumption in absolute units (bytes, kb, mb, gb) and as a percentage (0-100%)
of the container limit.
Expand All @@ -19,14 +23,19 @@ can use it to check and alert on
- image version (experimental!), does the running image match that in
the remote registry?

This check can communicate with a local docker daemon socket file (default) or with local
With check_swarm you can alert

- if a node is not joined to a docker swarm
- if a service is running in a swarm

These checks can communicate with a local docker daemon socket file (default) or with local
or remote docker daemons using secure and non-secure TCP connections.

This plugin requires python 3. It is tested on 3.3 and greater but may work on older
These plugins require python 3. It is tested on 3.3 and greater but may work on older
versions of 3.

Usage
-----
check_docker Usage
------------------

::

Expand Down Expand Up @@ -70,10 +79,41 @@ Usage
images. Only works with public registry.
--restarts WARN:CRIT Container restart thresholds.

check_swarm Usage
-----------------

::

usage: check_swarm [-h]
[--connection [/<path to>/docker.socket|<ip/host address>:<port>]
| --secure-connection [<ip/host address>:<port>]]
[--timeout TIMEOUT]
(--swarm | --service SERVICE [SERVICE ...])

Check docker swarm.

optional arguments:
-h, --help show this help message and exit
--connection [/<path to>/docker.socket|<ip/host address>:<port>]
Where to find docker daemon socket. (default:
/var/run/docker.sock)
--secure-connection [<ip/host address>:<port>]
Where to find TLS protected docker daemon socket.
--timeout TIMEOUT Connection timeout in seconds. (default: 10.0)
--swarm Check swarm status
--service SERVICE [SERVICE ...]
One or more RegEx that match the names of the
services(s) to check.
usage: check_swarm [-h]
[--connection [/<path to>/docker.socket|<ip/host address>:<port>]
| --secure-connection [<ip/host address>:<port>]]
[--timeout TIMEOUT]
(--swarm | --service SERVICE [SERVICE ...])

Gotchas:

- When using this with older versions of docker (I have seen 1.4 and 1.5) –status only supports ‘running’, ‘restarting’, and ‘paused’.
- When no container is specified all containers are checked. Some containers will return critcal status because the selected check(s) require a running container.
- When using check_docker with older versions of docker (I have seen 1.4 and 1.5) –status only supports ‘running’, ‘restarting’, and ‘paused’.
- When using check_docker, if no container is specified, all containers are checked. Some containers may return critcal status if the selected check(s) require a running container.

.. |Build Status| image:: https://travis-ci.org/timdaman/check_docker.svg?branch=master
:target: https://travis-ci.org/timdaman/check_docker
Expand Down
4 changes: 2 additions & 2 deletions check_swarm
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ def get_services(names):
if re.match("^{}$".format(matcher), candidate):
filtered.add(candidate)
found = True
# If we don't find a container that matches out regex
# If we don't find a service that matches out regex
if not found:
critical("No services match {}".format(matcher))

Expand Down Expand Up @@ -186,7 +186,7 @@ def process_url_status(status, ok_msg=None, critical_msg=None, unknown_msg=None)


def process_args(args):
parser = argparse.ArgumentParser(description='Check docker containers.')
parser = argparse.ArgumentParser(description='Check docker swarm.')

# Connect to local socket or ip address
connection_group = parser.add_mutually_exclusive_group()
Expand Down
6 changes: 3 additions & 3 deletions test_check_swarm.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,20 +251,20 @@ def test_check_service_called(self):

def test_check_service_results_OK(self):
args = ['--service', 'FOO']
with patch('check_swarm.get_services', return_value=(['FOO','BAR'], 200)):
with patch('check_swarm.get_services', return_value=['FOO','BAR']):
with patch('check_swarm.get_service_info', return_value=(self.service, 200)):
check_swarm.perform_checks(args)
self.assertEqual(check_swarm.rc, check_swarm.OK_RC)

def test_check_service_results_FAIL_missing(self):
args = ['--service', 'missing1']
with patch('check_swarm.get_service_info', return_value=('', 404)):
with patch('check_swarm.get_url', return_value=(self.services, 200)):
check_swarm.perform_checks(args)
self.assertEqual(check_swarm.rc, check_swarm.CRITICAL_RC)

def test_check_service_results_FAIL_unknown(self):
args = ['--service', 'FOO']
with patch('check_swarm.get_services', return_value=(['FOO','BAR'], 200)):
with patch('check_swarm.get_services', return_value=['FOO','BAR']):
with patch('check_swarm.get_service_info', return_value=('', 500)):
check_swarm.perform_checks(args)
self.assertEqual(check_swarm.rc, check_swarm.UNKNOWN_RC)
Expand Down

0 comments on commit 5069b9e

Please sign in to comment.