forked from ewilhelm/panoptimon
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
252 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,7 @@ | ||
# 0.5.4 | ||
|
||
* collectors/zfs - zpool health & space collector | ||
|
||
# 0.5.3 | ||
|
||
* collectors/fma - solaris faults collector | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Description | ||
|
||
Reports on zfs pool size and health. | ||
|
||
# Output | ||
|
||
The metrics `used`, `free`, `state`, and `faults` will always be | ||
present. If there are faults, the `errors` hash may contain additional | ||
metrics. The error metrics `read`, `write`, and `cksum` are as-reported | ||
for the zfs pool (see `zpool status`.) | ||
|
||
Note that `faults` may be non-zero while the pool is still fully | ||
functional (and `state=1`) if there are minor errors on a device, | ||
including data loss. The device-specific errors are reported under | ||
`_info`. Unrecoverable (pool) errors will appear as `read`, `write`, or | ||
`cksum` at the upper level. The value of `faults` is a count of all | ||
vdevs which contain errors (including the pool itself), so a flaky | ||
(redundant) disk in a mirror would appear as `faults=2`, but with 2/2 | ||
corrupted: faults = 2 (disks) + 1 (mirror) + 1 (pool) = 4. | ||
|
||
```yaml | ||
zpool: | ||
monkey: | ||
used: 39.2 # GB | ||
free: 17.483 # GB | ||
state: 10 # 10=ONLINE 5=DEGRADED, 0=FAULTED/UNAVAIL | ||
faults: 4 | ||
errors: | ||
read: 8 | ||
_info: | ||
c1t1d0: { state: "ONLINE", errors: {read: 2}} | ||
c2t5d0: { state: "ONLINE", errors: {read: 6}} | ||
files: ["<0x0>", "a.txt"] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
#!/bin/bash | ||
|
||
if [ "x$KNOB" == "xOK" ] ; then | ||
cat << END_OF_OUTPUT | ||
pool: monkey | ||
state: ONLINE | ||
scan: none requested | ||
config: | ||
NAME STATE READ WRITE CKSUM | ||
monkey ONLINE 0 0 0 | ||
mirror-0 ONLINE 0 0 0 | ||
/tmp/p1 ONLINE 0 0 0 | ||
/tmp/p2 ONLINE 0 0 0 | ||
mirror-1 ONLINE 0 0 0 | ||
/tmp/p3 ONLINE 0 0 0 | ||
/tmp/p4 ONLINE 0 0 0 | ||
errors: No known data errors | ||
pool: tank | ||
state: ONLINE | ||
scan: none requested | ||
config: | ||
NAME STATE READ WRITE CKSUM | ||
tank ONLINE 0 0 0 | ||
/dev/sdg ONLINE 0 0 0 | ||
errors: No known data errors | ||
END_OF_OUTPUT | ||
|
||
elif [ "x$KNOB" == "xERRORS" ] ; then | ||
cat << END_OF_OUTPUT | ||
pool: monkey | ||
state: ONLINE | ||
status: One or more devices has experienced an unrecoverable error. An | ||
attempt was made to correct the error. Applications are unaffected. | ||
action: Determine if the device needs to be replaced, and clear the errors | ||
using 'zpool clear' or replace the device with 'zpool replace'. | ||
see: http://zfsonlinux.org/msg/ZFS-8000-9P | ||
scan: scrub repaired 2K in 0h0m with 0 errors on Sat Jan 11 11:47:26 2014 | ||
config: | ||
NAME STATE READ WRITE CKSUM | ||
monkey ONLINE 0 0 0 | ||
mirror-0 ONLINE 0 0 0 | ||
/tmp/p1 ONLINE 0 0 0 | ||
/tmp/p2 ONLINE 9 80.6K 0 | ||
mirror-1 ONLINE 0 0 0 | ||
/tmp/p3 ONLINE 0 0 0 | ||
/tmp/p4 ONLINE 0 0 0 | ||
errors: No known data errors | ||
END_OF_OUTPUT | ||
|
||
elif [ "x$KNOB" == "xLOSSY" ] ; then | ||
cat << END_OF_OUTPUT | ||
pool: tank | ||
state: UNAVAIL | ||
status: One or more devices are faulted in response to IO failures. | ||
config: | ||
NAME STATE READ WRITE CKSUM | ||
tank UNAVAIL 0 0 0 insufficient replicas | ||
c1t0d0 ONLINE 0 0 0 | ||
c1t1d0 UNAVAIL 4 1 0 cannot open | ||
errors: Permanent errors have been detected in the following files: | ||
/tank/data/aaa | ||
/tank/data/bbb | ||
/tank/data/ccc | ||
END_OF_OUTPUT | ||
|
||
elif [ "x$KNOB" == "xDEGRADED" ] ; then | ||
cat << END_OF_OUTPUT | ||
pool: monkey | ||
state: DEGRADED | ||
status: One or more devices could not be opened. Sufficient replicas exist for | ||
the pool to continue functioning in a degraded state. | ||
action: Attach the missing device and online it using 'zpool online'. | ||
see: http://zfsonlinux.org/msg/ZFS-8000-2Q | ||
scan: scrub repaired 0 in 0h0m with 0 errors on Sat Jan 11 16:45:29 2014 | ||
config: | ||
NAME STATE READ WRITE CKSUM | ||
monkey DEGRADED 0 0 0 | ||
mirror-0 DEGRADED 0 0 0 | ||
/tmp/p1a UNAVAIL 0 0 0 cannot open | ||
/tmp/p2 ONLINE 0 0 0 | ||
mirror-1 ONLINE 0 0 0 | ||
/tmp/p3 ONLINE 0 0 0 | ||
/tmp/p4 ONLINE 0 0 0 | ||
errors: No known data errors | ||
END_OF_OUTPUT | ||
|
||
elif [ "x$KNOB" == "xCORRUPT" ] ; then | ||
cat << END_OF_OUTPUT | ||
pool: monkey | ||
state: ONLINE | ||
status: One or more devices has experienced an error resulting in data | ||
corruption. Applications may be affected. | ||
action: Restore the file in question if possible. Otherwise restore the | ||
entire pool from backup. | ||
see: http://zfsonlinux.org/msg/ZFS-8000-8A | ||
scan: resilvered 0 in 0h0m with 0 errors on Sat Jan 11 11:52:56 2014 | ||
config: | ||
NAME STATE READ WRITE CKSUM | ||
monkey ONLINE 0 30.0K 0 | ||
mirror-0 ONLINE 0 30.0K 0 | ||
/tmp/p1 ONLINE 23 59.3K 0 | ||
/tmp/p2 ONLINE 47 62.3K 0 | ||
mirror-1 ONLINE 0 0 0 | ||
/tmp/p3 ONLINE 0 0 0 | ||
/tmp/p4 ONLINE 0 0 0 | ||
errors: Permanent errors have been detected in the following files: | ||
monkey:<0x0> | ||
/monkey/ | ||
END_OF_OUTPUT | ||
|
||
|
||
else | ||
echo "oh no" >&2 | ||
exit 1 | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
#!/usr/bin/env ruby | ||
# Copyright (C) 2014 Cisco, Inc. | ||
|
||
require 'json' | ||
class Array; def to_h; Hash[*self.flatten]; end; end | ||
|
||
opt = ARGV[0] ? JSON::parse(ARGV[0], {symbolize_names: true}) : {} | ||
|
||
zpool_cmd = [opt[:zpool_cmd] || 'zpool'].flatten | ||
zfs_cmd = [opt[:zfs_cmd] || 'zfs'].flatten | ||
|
||
######################################################################## | ||
states = { ONLINE: 10, DEGRADED: 5, FAULTED: 0, UNAVAIL: 0 } | ||
convert_10 = ->(){ | ||
unit = { 'G' => 1E9, 'M' => 1E6, 'K' => 1E3 } | ||
->(x) { | ||
x =~ /([\d.]+)([GMK])/ ? | ||
($1.to_f * unit[$2]).to_i | ||
: x.to_i | ||
} | ||
}[] | ||
convert_GB = ->(){ | ||
exp = 0 | ||
unit = %w(B K M G T P E Z).map {|p| exp += 1; [p, 1024**exp-1]}.to_h | ||
->(x) { | ||
(x.sub(/([BKMGTPEZ])/, '').to_f * | ||
(unit[$1] or raise "unknown unit #{$1}") / | ||
unit['G']).round(6) | ||
} | ||
}[] | ||
######################################################################## | ||
|
||
status = IO.popen(zpool_cmd + ['status', '-v']) {|fh| fh.readline(nil)}. | ||
split(/\s+pool:\s+(\S+)\n/).drop(1).to_h | ||
raise "zpool error - #{$?.exitstatus}" unless $?.success? | ||
|
||
metrics = status.keys.map {|pool| | ||
info = status[pool].split(/^\s*(\w+):\s+/).drop(1).to_h | ||
info.each_value {|v| v.chomp!} | ||
m = {} # inner metrics | ||
m[:state] = states[info['state'].to_sym] || | ||
begin; warn "unknown state #{info['state']}"; -1; end | ||
|
||
info['config'] =~ /NAME\s+STATE\s+READ\s+WRITE/ or | ||
raise "not expecting format of config: #{info['config']}" | ||
errors = {} | ||
info['config'].split(/\n/).drop(1).each {|line| | ||
(dev, state, r, w, c) = line.split(/\s+/).drop(1) | ||
(r,w,c) = [r,w,c].map {|x| convert_10[x]} | ||
if [r,w,c].find {|x| x > 0} || state != 'ONLINE' | ||
e = errors[dev] = {state: state, read: r, write: w, cksum: c} | ||
[:read, :write, :cksum].each {|k| e.delete(k) if e[k] == 0} | ||
end | ||
} | ||
m[:faults] = errors.keys.count | ||
if errors.keys.count > 0 | ||
e = m[:errors] = {_info: errors} | ||
if pe = errors.delete(pool) | ||
pe.delete(:state) # string / redundant | ||
e.merge!(pe) | ||
end | ||
end | ||
|
||
if info['errors'] =~ /files:\n\n\s*(\S+.*)/m | ||
m[:errors][:_info][:_files] = $1.chomp.split(/\s+/) | ||
end | ||
|
||
[pool, m] | ||
}.to_h | ||
|
||
|
||
IO.popen(zfs_cmd + ['list', '-H', '-o', 'name,used,avail'] + | ||
metrics.keys) {|fh| | ||
fh.readlines.each {|line| | ||
(pool, used, free) = line.split(/\s+/) | ||
metrics[pool][:used] = convert_GB[used] | ||
metrics[pool][:free] = convert_GB[free] | ||
} | ||
} | ||
raise "zpool error - #{$?.exitstatus}" unless $?.success? | ||
|
||
puts JSON::generate(metrics) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Copyright (C) 2012-2014 Cisco, Inc. | ||
|
||
module Panoptimon | ||
VERSION = "0.5.3" | ||
VERSION = "0.5.4" | ||
end |