Incompatibility with collectd Plugin:ConnTrack #56

vbatoufflet · 2018-01-26T17:04:38Z

Hi,

When exporting data from a collectd instance having the ConnTrack plugin activated, I get this error:

# curl -s -D /dev/stderr localhost:9103/metrics
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Fri, 26 Jan 2018 15:03:48 GMT
Content-Length: 292

An error has occurred during metrics collection:

collected metric collectd_conntrack label:<name:"conntrack" value:"max" > label:<name:"instance" value:"gw1.example.net" > gauge:<value:256000 >  has label dimensions inconsistent with previously collected metrics in the same metric family

It seems that there is a mismatch between received metrics over time making the exporter to cease to function properly.

Any idea of something obvious I missed here?

Kind regards,
Vincent

The text was updated successfully, but these errors were encountered:

fmoessbauer · 2018-01-27T10:23:43Z

I have exactly the same issue when collecting data from Lede routers. Unfortunately I did not have the time to investigate this further, but at a first glance it seems that the problem is the located in client_golang
and not in the collectd_exporter itself.

There is also a similar issue reported on telegraf: influxdata/telegraf#2822

In the meantime the only workaround is to use the previous release. If using docker, pull prom/collectd-exporter:0.3.1 which works.

matthiasr · 2018-01-27T17:45:37Z

The reason is a change in the library, but this has been discussed in various places and the result of these was that it is correct not to accept inconsistent label dimensions. What is happening is that for some reason, metrics like

metric_name{label1="value1"}
metric_name{label1="value1",label2="value2"}

are being generated. This is illegal, but this was not enforced in the past. The correct way is to pad all metrics to have the same set of label names, using empty values where there is nothing to put in a particular label. The above should be

metric_name{label1="value1"}
metric_name{label1="value1",label2="value2"}

How to do this best is somewhat dependent on the source of the metrics, so the client library cannot do this for you automatically (so it doesn't try).

matthiasr · 2018-01-27T17:46:21Z

@beorn7 is there an FAQ entry or something for this?

beorn7 · 2018-01-29T12:42:54Z

What client_golang implements here, is in the spec: https://prometheus.io/docs/instrumenting/exposition_formats/#protocol-buffer-format-details

One might argue only advanced use cases even run into that problem (if you hand-craft your metrics, like in this case where you have to mirror them from other metrics).

I'll have an ingenious(?) plan to provide an elegant way to deal with this use case, which will be the fix of both prometheus/client_golang#355 and prometheus/client_golang#47 . It's just a few SAT days away.

In different news, what @matthiasr meant as "the above should be", should actually be

metric_name{label1="value1",label2=""}
metric_name{label1="value1",label2="value2"}

vbatoufflet · 2018-01-29T13:05:11Z

Thanks for your replies.

Indeed, it seems that collectd's conntrack plugin is sending the conntrack metric sometime with its max value, and the other time with the actual one.

I'll try filter this metric for the moment to prevent failures on the instance hosting this exporter.

Is it possible to discard the faulty metrics preventing the whole scraping to fail when Prometheus requests data on the exporter?

beorn7 · 2018-01-29T14:32:15Z

You can try your luck with ContinueOnError, see https://godoc.org/github.com/prometheus/client_golang/prometheus/promhttp#HandlerErrorHandling . But I'm not sure from the top of my head if that does what you want.

vbatoufflet · 2018-01-29T16:04:20Z

Following your input, I was able to make the exporter work with such metrics issue.

I made a PR (see #57) for that, hoping it'll match the requirements here.

fvigotti · 2018-09-06T15:13:11Z

I've been hit by this problem hard today.. took me a while to understand why the health check was restarting a pod that was previously working..
the problem is that the exporter works.. then it receive wrongly formatted data and the http endpoints then output http 500 status code with errors....
I may understand the rationale to fail hard instead of just being resilient (at least in a generic case, but this is an app that collect data! should be able to handle some wrong input without crashing..) and signal the issue to the app mantainer....

but a better approach in my opinion would be to use the log to log the wrongly formatted ( and discarded metrics )

let me explain:
this application receive thousands of metrics, from tens of servers ( in my case at least ) and if someone somewhere sometime send something wrong the whole application return error 500, without any metrics printed , as collateral this triggered the pod livenessProbe ( which now I've replaced with only a tcp socket ) very weird behaviour for an application which is live but in incompatible state with liveness httpGet probe of kubernetes defaults ( which threats error 500 as failure of the whole application )

(
if you intention is to not restart the whole application when such error occurrs ( which also doesn't depend on this application but on external input data ) this means that this application is incompatible with httpGet livenessProbe which is counter-intuitive behaviour.. being an http endpoint..
)

.. ( wrong metrics could be outputted as comments at least.. ) and also will return 0 data (of some thousands metrics which are fine ) to the scraper..

In my opinion your approach it's like a database crash when receive a wrong INPUT query

my thoughts are that you should just output a ERROR in app logs which maybe can be downgraded to a WARN with a command flag ( something like --wrong-input-warn )

thanks,
Francesco

fmoessbauer · 2018-09-10T16:21:50Z

@fvigotti: I totally agree with your statement. Ill-formed input from a single client should definitely not result in a total crash. Otherwise this exporter is simply not suited for large-scale deployments.

Conceptually spoken the incoming data has to be validated by the ingress API and only valid metrics should be passed to later processing steps. The comparison with the database perfectly illustrates this.

hasso · 2018-09-12T07:11:31Z

While I agree that there shouldn't be a way to DOS collectd exporter like this and this needs to be fixed in both collectd and collectd_exporter, I'll document a workaround to add into your collectd.conf for those in trouble for now.

LoadPlugin match_regex
LoadPlugin target_set

<Chain "PreCache">
  <Rule "CONNTRACK-FIX-TYPE">
    <Match "regex">
      Plugin                    "^conntrack$"
      TypeInstance              "^$"
    </Match>
    <Target "set">
      TypeInstance              "current"
    </Target>
  </Rule>

  <Target "write">
  </Target>
</Chain>

beorn7 · 2018-09-13T23:21:48Z

Note prometheus/client_golang#417 . In short: After much back and forth, it was decided that inconsistent label dimensions are actually fine.
(The library will still only expose valid metric data. Just that "missing labels" are now valid.)

vbatoufflet mentioned this issue Jan 29, 2018

Switch to promhttp handler and continue on error #57

Closed

hasso mentioned this issue Sep 5, 2018

conntrack plugin metric names are not graphite friendly collectd/collectd#2662

Open

fvigotti mentioned this issue Sep 13, 2018

Reject inconsistent metrics at push time (and create a synthetic metric about it) prometheus/pushgateway#202

Closed

fstolba mentioned this issue Jul 5, 2019

Prometheus - metric has label dimensions inconsistent with previously collected metrics in the same metric family nileshsimaria/jtimon#47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatibility with collectd Plugin:ConnTrack #56

Incompatibility with collectd Plugin:ConnTrack #56

vbatoufflet commented Jan 26, 2018 •

edited

Loading

fmoessbauer commented Jan 27, 2018

matthiasr commented Jan 27, 2018

matthiasr commented Jan 27, 2018

beorn7 commented Jan 29, 2018 •

edited

Loading

vbatoufflet commented Jan 29, 2018

beorn7 commented Jan 29, 2018

vbatoufflet commented Jan 29, 2018

fvigotti commented Sep 6, 2018 •

edited

Loading

fmoessbauer commented Sep 10, 2018 •

edited

Loading

hasso commented Sep 12, 2018

beorn7 commented Sep 13, 2018

Incompatibility with collectd Plugin:ConnTrack #56

Incompatibility with collectd Plugin:ConnTrack #56

Comments

vbatoufflet commented Jan 26, 2018 • edited Loading

fmoessbauer commented Jan 27, 2018

matthiasr commented Jan 27, 2018

matthiasr commented Jan 27, 2018

beorn7 commented Jan 29, 2018 • edited Loading

vbatoufflet commented Jan 29, 2018

beorn7 commented Jan 29, 2018

vbatoufflet commented Jan 29, 2018

fvigotti commented Sep 6, 2018 • edited Loading

fmoessbauer commented Sep 10, 2018 • edited Loading

hasso commented Sep 12, 2018

beorn7 commented Sep 13, 2018

vbatoufflet commented Jan 26, 2018 •

edited

Loading

beorn7 commented Jan 29, 2018 •

edited

Loading

fvigotti commented Sep 6, 2018 •

edited

Loading

fmoessbauer commented Sep 10, 2018 •

edited

Loading