Skip to content

Commit

Permalink
Merge pull request #6 from Naudit/develop-qol-improvements
Browse files Browse the repository at this point in the history
Adding quality of life improvements
  • Loading branch information
ralequi authored Nov 20, 2024
2 parents a6da931 + 092a622 commit 8ae5e2d
Show file tree
Hide file tree
Showing 4 changed files with 185 additions and 41 deletions.
136 changes: 112 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
pySMART-exporter
===========
# pySMART-exporter

![](https://img.shields.io/pypi/v/pySMART-exporter?label=release)
![](https://img.shields.io/pypi/pyversions/pySMART-exporter)
Expand All @@ -8,29 +8,73 @@ pySMART-exporter
![](https://img.shields.io/github/issues-pr/Naudit/pySMART-exporter)
![](https://img.shields.io/pypi/dm/pysmart-exporter)

Copyright (C) 2021 Naudit HPCN S.L.
Copyright (C) 2021 Rafael Leira, Naudit HPCN S.L.

PySMART-exporter is a simple Python prometheus exporter built on top of [PySMART library](https://github.com/truenas/py-SMART).
`pySMART-exporter` is a Python Prometheus exporter for collecting and exposing S.M.A.R.T. metrics of storage devices. It leverages the [pySMART library](https://github.com/truenas/py-SMART) and integrates Prometheus client library functionalities for HTTP publication or file-based metric exports.

---

Usage
=====
## Features

Server mode
-----------
To Use the exporter in server mode you can simply run as
- Collects S.M.A.R.T. metrics from storage devices.
- Supports Prometheus integration via HTTP or text-based node collector files.
- Includes support for various storage interfaces, including NVMe attributes and diagnostics.

`pysmart_exporter -l 0.0.0.0:9099`
---

And configure your prometheus to access it.
## Installation

File mode
---------
If you whish to generate just a metric sample, you can run this:
The `pySMART-exporter` can be installed via PyPI:

`pysmart_exporter -f out.txt -1`
```bash
python -m pip install pySMART-exporter
```

Ensure that `smartctl` from the `smartmontools` package is installed, as it is a prerequisite. For most Linux distributions, use your package manager:

```bash
sudo apt-get install smartmontools

# or

sudo yum install smartmontools
```

---

## Usage

The exporter supports two modes: **server mode** (HTTP) and **file mode** (node exporter textfile). It should run as a privileged user to access disk information.

### Server Mode

It may generate a file with a similar content as:
To run the exporter in server mode, execute the following command:

```bash
pysmart_exporter -l 0.0.0.0:9099
```

Then configure Prometheus to scrape metrics from the endpoint.

### File Mode

To generate a one-time metric file for use with a Prometheus node exporter:

```bash
pysmart_exporter -f /path/to/output/file.txt -1
```

To continuously generate metric files at a set interval (e.g., 60 seconds):

```bash
pysmart_exporter -f /path/to/output/file.txt -i 60
```

---

## Example Metrics Output

Below is a sample of the metrics exposed by `pySMART-exporter`:

```prometheus
# HELP pysmart_info PySMART metric info
Expand All @@ -47,17 +91,47 @@ pysmart_temperature{device="nvme0",interface="nvme"} 44.0
pysmart_size{device="nvme0",interface="nvme"} 2.56e+011
# HELP pysmart_test_capabilities PySMART metric test_capabilities
# TYPE pysmart_test_capabilities gauge
pysmart_test_capabilities{device="nvme0",interface="nvme",pysmart_test_capabilities="conveyance"} 0.0
pysmart_test_capabilities{device="nvme0",interface="nvme",pysmart_test_capabilities="long"} 0.0
pysmart_test_capabilities{device="nvme0",interface="nvme",pysmart_test_capabilities="offline"} 0.0
pysmart_test_capabilities{device="nvme0",interface="nvme",pysmart_test_capabilities="selective"} 0.0
pysmart_test_capabilities{device="nvme0",interface="nvme",pysmart_test_capabilities="short"} 0.0
pysmart_test_capabilities{device="nvme0",interface="nvme",pysmart_test_capabilities="short"} 1.0
```

You can also set an interval with `-i` instead of `-1` to keep flushing data every n seconds
---

## CLI Options

| Option | Description |
|-----------------------|--------------------------------------------------------------------------------------------------|
| `-f`, `--textfile-name` | Path to the file where metrics will be stored for node collection. |
| `-l`, `--listen` | Host and port to listen on in HTTP server mode (e.g., `0.0.0.0:9417`). |
| `-i`, `--interval` | Interval (in seconds) between metric updates. Default: `60`. |
| `-1`, `--oneshot` | Run only once and exit (useful for cron jobs). |
| `-q`, `--quiet` | Suppress error messages and warnings. |
| `--include` | Comma-separated list of devices to include (e.g., `nvme0,/dev/sda`). |
| `--exclude` | Comma-separated list of devices to exclude. |
| `--metric-prefix` | Custom prefix for metrics. Default: `pysmart`. |
| `--metrics` | Comma-separated list of specific metrics to export (e.g., `temperature,size,assessment_passed`).|

---

## Metrics

| **Metric Name** | **Type** | **Description** | **Labels** |
|---|---|---|---|
|`pysmart_info`|`info`|General information about the disk, including model, firmware, size, and other static attributes.|`device`, `interface`, `model`, `serial`, `firmware`, `rotation`, `size_raw`, `size`, `ssd`, `smart_capable`, `smart_enabled`, `vendor`, `sector_size`, and more.|
|`pysmart_assessment_passed`|`gauge`|Assessment of the disk's health. `1` for PASS, `0` otherwise.|`device`, `interface`|
|`pysmart_temperature`|`gauge`|Current temperature of the disk in Celsius.|`device`, `interface`|
|`pysmart_size`|`gauge`|Disk size in bytes.|`device`, `interface`|
|`pysmart_attribute_value`|`gauge`|SMART attribute values such as error counts, read/write metrics, etc.|`device`, `name` (attribute name), `num`, `type`, `flags`, `updated`, `whenfailed`, and more depending on attribute.|
|`pysmart_attribute_thresh`|`gauge`|Threshold values for SMART attributes.|Similar to `pysmart_attribute_value`|
|`pysmart_attribute_worst`|`gauge`|The worst recorded value for a given SMART attribute.|Similar to `pysmart_attribute_value`|
|`pysmart_attribute_raw`|`gauge`|The raw value for a SMART attribute.|Similar to `pysmart_attribute_value`|
|`pysmart_diagnostics_*`|`gauge`|Disk diagnostic statistics, including errors and other health-related data.|`device`, `interface`|
|`pysmart_test_capabilities`|`state`|Types of self-tests supported by the disk (e.g., short, long, offline).|`device`, `interface`|
|`pysmart_test`|`gauge`|Details about completed or pending disk self-tests.|`device`, `type` (test type), `status`, `hours`, `num`, and other self-test details.|

---

## Installation

Installation
============
``pySMART-exporter`` is available on PyPI and installable via ``pip``::

python -m pip install pySMART-exporter
Expand All @@ -68,3 +142,17 @@ The only external (non-python) dependency is the ``smartctl`` component of the s
or
yum install smartmontools


## License

This program is distributed under the terms of the license specified in the [LICENSE](./LICENSE) file.

---

## References

- [pySMART Library](https://github.com/truenas/py-SMART)

- [Prometheus Documentation](https://prometheus.io/docs/)

- [SMART Monitoring Tools](https://www.smartmontools.org/)
2 changes: 1 addition & 1 deletion pysmart_exporter/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2021 Rafael Leira
# Copyright (C) 2021 Rafael Leira, Naudit HPCN S.L.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
Expand Down
13 changes: 12 additions & 1 deletion pysmart_exporter/__main__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2021 Rafael Leira
# Copyright (C) 2021 Rafael Leira, Naudit HPCN S.L.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
Expand All @@ -12,6 +12,9 @@
from pysmart_exporter.collector import PySMARTCollector
import time
import sys
import subprocess
import os
import logging


def main():
Expand All @@ -25,6 +28,14 @@ def main():
registry.register(collector)
args = collector.args

if os.geteuid() != 0:
logging.error('Due to the privileges needed from Smartmontools, it should run as root.')
try:
subprocess.check_call(['sudo', sys.executable] + sys.argv)
except subprocess.CalledProcessError as e:
logging.error(f'Failed to execute with sudo: {e}')
sys.exit(e.returncode)

if args['listen']:
(ip, port) = args['listen'].split(':')
prometheus_client.start_http_server(port=int(port), addr=ip, registry=registry)
Expand Down
75 changes: 60 additions & 15 deletions pysmart_exporter/collector.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python
#
# Copyright (C) 2021 Rafael Leira
# Copyright (C) 2021 Rafael Leira, Naudit HPCN S.L.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
Expand Down Expand Up @@ -76,6 +76,31 @@ def _parse_args(self, args, prog=None):
help='Silence any error messages and warnings',
)
parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__)
parser.add_argument(
'--include',
dest='include',
type=str,
help='Comma-separated list of devices to include, e.g., nvme0/dev/sda,/dev/sdb',
)
parser.add_argument(
'--exclude',
dest='exclude',
type=str,
help='Comma-separated list of devices to exclude, e.g., nvme0/dev/sdc,/dev/sdd',
)
parser.add_argument(
'--metric-prefix',
dest='metric_prefix',
default='pysmart',
help='Custom prefix for exported metrics. Defaults to pysmart.',
)
parser.add_argument(
'--metrics',
dest='metrics',
type=str,
help='Comma-separated list of metrics to include (Info metrics will always be included), e.g., temperature,size,assessment_passed',
)

arguments = parser.parse_args(args)
if arguments.quiet:
logging.getLogger().setLevel(100)
Expand Down Expand Up @@ -123,11 +148,11 @@ def add_metric(
description = 'PySMART metric ' + name

if type == 'info':
gauges[name] = InfoMetricFamily('pysmart', description, labels=labels.keys())
gauges[name] = InfoMetricFamily(self.args['metric_prefix'], description, labels=labels.keys())
elif type == 'state':
gauges[name] = StateSetMetricFamily('pysmart_' + name, description, labels=labels.keys())
gauges[name] = StateSetMetricFamily(self.args['metric_prefix'] + '_' + name, description, labels=labels.keys())
else:
gauges[name] = GaugeMetricFamily('pysmart_' + name, description, labels=labels.keys())
gauges[name] = GaugeMetricFamily(self.args['metric_prefix'] + '_' + name, description, labels=labels.keys())

for k in labels.keys():
if labels[k] is None:
Expand Down Expand Up @@ -158,6 +183,8 @@ def update_pysmart_stats(self, disk: Device, gauges):

# Check for raid

metrics_included = [metric.strip() for metric in self.args['metrics'].split(',')] if self.args['metrics'] else []

# Info
# All label values should be strings, even if they are None.
# Force them all through the str() call
Expand All @@ -181,7 +208,7 @@ def update_pysmart_stats(self, disk: Device, gauges):
self.add_metric(gauges, disk, 'info', 1, labels=info_labels, type='info')

# Assessment / Disk state
if disk.assessment is not None:
if disk.assessment is not None and ('assessment_passed' in metrics_included or not metrics_included):
self.add_metric(
gauges,
disk,
Expand All @@ -191,17 +218,19 @@ def update_pysmart_stats(self, disk: Device, gauges):
)

# Temperature
if disk.temperature is not None:
if disk.temperature is not None and ('temperature' in metrics_included or not metrics_included):
self.add_metric(gauges, disk, 'temperature', disk.temperature, labels=common_labels)

# Size
if disk.size is not None:
if disk.size is not None and ('size' in metrics_included or not metrics_included):
self.add_metric(gauges, disk, 'size', disk.size, labels=common_labels)

if isinstance(disk.if_attributes, NvmeAttributes):
#### New Nvme Attributes ####
for attr_name, attribute in disk.if_attributes.__dict__.items():
# Ensure the attribute is not None and valid before proceeding
if metrics_included and attr_name not in metrics_included:
continue
if isinstance(attribute, (int, float)):
attribute_labels = {
'name': attr_name, # Attribute name
Expand All @@ -220,6 +249,8 @@ def update_pysmart_stats(self, disk: Device, gauges):
#### Old Attributes ####
for attribute in disk.attributes:
if attribute is not None:
if metrics_included and attribute.name not in metrics_included:
continue
attribute_labels = {
'num': str(attribute.num),
'name': attribute.name,
Expand Down Expand Up @@ -262,6 +293,8 @@ def update_pysmart_stats(self, disk: Device, gauges):

#### New Attributes ####
for diag in vars(disk.diagnostics):
if metrics_included and diag not in metrics_included:
continue
diag_labels = {**common_labels}

# Set to -1 if undefined/None
Expand All @@ -271,15 +304,20 @@ def update_pysmart_stats(self, disk: Device, gauges):

#### Tests ####
# Supported test types
self.add_metric(
gauges,
disk,
'test_capabilities',
disk.test_capabilities,
labels=common_labels,
type='state',
)
if not metrics_included or (metrics_included and 'test_capabilities' in metrics_included):
self.add_metric(
gauges,
disk,
'test_capabilities',
disk.test_capabilities,
labels=common_labels,
type='state',
)

for test in disk.tests:
if metrics_included and test.type not in metrics_included:
continue

test_labels = {
'num': str(test.num),
'hours': str(test.hours),
Expand Down Expand Up @@ -310,8 +348,15 @@ def collect(self):
uses this method to respond to http queries or save them to disk.
"""
gauges = {}
include_devices = [device.strip() for device in self.args['include'].split(',')] if self.args['include'] else []
exclude_devices = [device.strip() for device in self.args['exclude'].split(',')] if self.args['exclude'] else []


for disk in DeviceList():
if include_devices and not (disk.name in include_devices or disk.dev_reference in include_devices):
continue
if exclude_devices and (disk.name in exclude_devices or disk.dev_reference in exclude_devices):
continue
try:
self.update_pysmart_stats(disk, gauges)
except Exception as e:
Expand Down

0 comments on commit 8ae5e2d

Please sign in to comment.