Skip to content

Commit

Permalink
v0.5.2 - service-smf
Browse files Browse the repository at this point in the history
  • Loading branch information
Eric Wilhelm committed Jan 5, 2014
2 parents dc288f5 + df75e3c commit e93334f
Show file tree
Hide file tree
Showing 4 changed files with 108 additions and 25 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# 0.5.2

* collectors/service - solaris (smf) implementation

# 0.5.1

* collectors/disk - solaris portability
Expand Down
71 changes: 49 additions & 22 deletions collectors/service/README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,60 @@
upstart: `initctl list`
# Description

```none
qemu-kvm start/running
rc stop/waiting
rsyslog start/running, process 1237
network-interface (lo) start/running
```
Reports on running / flapping services under various service management
daemons.

daemontools: `svstat /service/\*` (aside: `ps -eo %a | grep '[s]vscan ' | cut -d' ' -f2 | sort -u`)
# Config

```none
The `services` hash must contain entries for whichever service
management daemons you wish to monitor.

/service/chef-client: up (pid 14971) 1197 seconds
/service/hubot: up (pid 6583) 8895 seconds
/service/resmon: up (pid 633) 4131042 seconds
/service/syslog-ng: up (pid 632) 4131042 seconds
/service/mail-in: down 3 seconds, normally up
```
## Options

* `interval` — report interval, in seconds
* `flaptime` — flap threshold, in seconds
* `since` — flap horizon, in seconds
* `faults` — (experimental) report fault statistics [true|false|"only"]

## Available Daemons

### daemontools

Supported arguments:

* `-monitor` — an array of globs for service identifiers
* `-options` — a hash of options
* `svstat` — path to svstat command (or as array with arguments)

systemd: `systemctl list-units --full --type=service --all`, `systemctl show NAME NAME NAME ...`
All other keys are taken as the name of services to report on. By
default, these are found under `/service/$name`, but an optional `path`
entry in the argument's hash can be used to alias service names.

## Config
`"shortname": {"path" : "/service/name-too-long-for-daily-use"}`

You must provide either an explicit list of services or some globs in
'-monitor' (or else nothing is monitored.)

### smf

Supported arguments:

* `-monitor` — an array of globs for service identifiers
* `-options` — a hash of options
* `svcs` — path to svcs command (or as array with arguments)

The globs given in `-monitor` are passed to `svcs` and must match the
service FMRI.

## Example

```json
{
interval: 60
flaptime: 30,
since: 900,
services: {
init: { foo : { status_cmd : "..."} },
systemd: {
init: { foo : { status_cmd : "..."} }, # TODO
systemd: { # TODO
sshd : {...}
},
daemontools: {
Expand All @@ -42,10 +67,12 @@ systemd: `systemctl list-units --full --type=service --all`, `systemctl show NAM
}
```

## Output
# Output

* up: seconds the service has been up (negative if it has been shutdown)
* flaps: number of flaps (runs under 'flaptime') within the 'since' horizon

services|daemontools|syslog-ng|up => $seconds
services|daemontools|syslog-ng|flaps => $n
```none
services|daemontools|syslog-ng|up => $seconds
services|daemontools|syslog-ng|flaps => $n
```
56 changes: 54 additions & 2 deletions collectors/service/service
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env ruby
# Copyright (C) 2012 Sourcefire, Inc.
# Copyright (C) 2012-2014 Cisco, Inc.

require 'json'
require 'pathname'
Expand All @@ -13,10 +13,11 @@ class MatchData; def to_h
end; end
class Array; def to_h; Hash[self]; end; end

# each of these returns a lambda
handler = {
daemontools: ->(srv){
o = srv.delete(:'-options') || {}
mon = srv.delete(:'-monitor')
mon = srv.delete(:'-monitor') || []
cmd = [o[:svstat] || 'svstat'].flatten
fn = mon.map {|n| Pathname.glob(n.to_s)}.flatten.
map {|p| [p.to_s, p.basename.to_s.to_sym]}.to_h.
Expand All @@ -40,8 +41,46 @@ handler = {
}.to_h
}
},
smf: ->(srv) {
mon = srv.delete(:'-monitor') || []
mon.is_a?(Array) or raise "'-monitor' argument must be an array"
o = srv.delete(:'-options') || {}
cmd = [o[:svcs] || 'svcs'].flatten

require 'date'

return ->() {
now = Time.now.to_i
stat = IO.popen(cmd + ['-x'] + mon) {|fh| fh.readlines("\n\n") }.
map {|svc|
info = svc.match(%r{
\A(?<fmri>\S+)\s.*?\n
\s+State:\s+(?<state>\S+)\s+
since\s+(?<date>.*?)\n
}x) or raise "cannot parse #{svc}"
info = info.to_h
duration = now - DateTime.parse(info[:date]).to_time.to_i
# don't report negative duration
duration = 0 if duration < 0
# fault time needs to be non-zero to turn into negative metric
duration = 1 if duration == 0 and info[:state] != 'online'
[info[:fmri], {
state: (info[:state] == 'online' ? 'up' : info[:state]),
duration: duration,
}]
}.to_h
raise "svcs error - #{$?.exitstatus}" unless $?.success?

# TODO report disabled, but explicitly listed services?
stat.keys.each {|fmri|
stat.delete(fmri) if stat[fmri][:state] == 'disabled'
}
return stat
}
},
}

# initialize the required handlers
services = Hash[opt[:services].map {|k,v|
how = handler[k] or raise "unknown service type #{k}"; [k, how[v]]}]

Expand Down Expand Up @@ -69,6 +108,8 @@ class AFlap
end
end

########################################################################
# mainloop
hist = AFlap.new(opt)
while true
metrics = services.map {|k,v|
Expand All @@ -77,6 +118,17 @@ while true
}.to_h
[k, data]
}.to_h

# TODO move these sorts of things to a summarizing plugin
if opt[:faults]
faulted = metrics.map {|daemon,h| h.map {|service,i|
i[:up] > 0 && not(i[:flaps] > 0) ? [] : ["#{daemon}|#{service}"]
}}.flatten
metrics = {} if opt[:faults] == 'only'
metrics[:"-faults"] = faulted.count
metrics[:_info] ||= {faults: faulted} if faulted.count > 0
end

puts JSON::generate(metrics)
sleep(opt[:interval])
if opt[:limit]
Expand Down
2 changes: 1 addition & 1 deletion lib/panoptimon/version.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Copyright (C) 2012-2014 Cisco, Inc.

module Panoptimon
VERSION = "0.5.1"
VERSION = "0.5.2"
end

0 comments on commit e93334f

Please sign in to comment.