Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with collectd Plugin:ConnTrack #56

Open
vbatoufflet opened this issue Jan 26, 2018 · 11 comments
Open

Incompatibility with collectd Plugin:ConnTrack #56

vbatoufflet opened this issue Jan 26, 2018 · 11 comments

Comments

@vbatoufflet
Copy link

vbatoufflet commented Jan 26, 2018

Hi,

When exporting data from a collectd instance having the ConnTrack plugin activated, I get this error:

# curl -s -D /dev/stderr localhost:9103/metrics
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Fri, 26 Jan 2018 15:03:48 GMT
Content-Length: 292

An error has occurred during metrics collection:

collected metric collectd_conntrack label:<name:"conntrack" value:"max" > label:<name:"instance" value:"gw1.example.net" > gauge:<value:256000 >  has label dimensions inconsistent with previously collected metrics in the same metric family

It seems that there is a mismatch between received metrics over time making the exporter to cease to function properly.

Any idea of something obvious I missed here?

Kind regards,
Vincent

@fmoessbauer
Copy link

I have exactly the same issue when collecting data from Lede routers. Unfortunately I did not have the time to investigate this further, but at a first glance it seems that the problem is the located in client_golang
and not in the collectd_exporter itself.

There is also a similar issue reported on telegraf: influxdata/telegraf#2822

In the meantime the only workaround is to use the previous release. If using docker, pull prom/collectd-exporter:0.3.1 which works.

@matthiasr
Copy link

The reason is a change in the library, but this has been discussed in various places and the result of these was that it is correct not to accept inconsistent label dimensions. What is happening is that for some reason, metrics like

metric_name{label1="value1"}
metric_name{label1="value1",label2="value2"}

are being generated. This is illegal, but this was not enforced in the past. The correct way is to pad all metrics to have the same set of label names, using empty values where there is nothing to put in a particular label. The above should be

metric_name{label1="value1"}
metric_name{label1="value1",label2="value2"}

How to do this best is somewhat dependent on the source of the metrics, so the client library cannot do this for you automatically (so it doesn't try).

@matthiasr
Copy link

@beorn7 is there an FAQ entry or something for this?

@beorn7
Copy link
Member

beorn7 commented Jan 29, 2018

What client_golang implements here, is in the spec: https://prometheus.io/docs/instrumenting/exposition_formats/#protocol-buffer-format-details

One might argue only advanced use cases even run into that problem (if you hand-craft your metrics, like in this case where you have to mirror them from other metrics).

I'll have an ingenious(?) plan to provide an elegant way to deal with this use case, which will be the fix of both prometheus/client_golang#355 and prometheus/client_golang#47 . It's just a few SAT days away.

In different news, what @matthiasr meant as "the above should be", should actually be

metric_name{label1="value1",label2=""}
metric_name{label1="value1",label2="value2"}

@vbatoufflet
Copy link
Author

Thanks for your replies.

Indeed, it seems that collectd's conntrack plugin is sending the conntrack metric sometime with its max value, and the other time with the actual one.

I'll try filter this metric for the moment to prevent failures on the instance hosting this exporter.

Is it possible to discard the faulty metrics preventing the whole scraping to fail when Prometheus requests data on the exporter?

@beorn7
Copy link
Member

beorn7 commented Jan 29, 2018

You can try your luck with ContinueOnError, see https://godoc.org/github.com/prometheus/client_golang/prometheus/promhttp#HandlerErrorHandling . But I'm not sure from the top of my head if that does what you want.

@vbatoufflet
Copy link
Author

Following your input, I was able to make the exporter work with such metrics issue.

I made a PR (see #57) for that, hoping it'll match the requirements here.

@fvigotti
Copy link

fvigotti commented Sep 6, 2018

I've been hit by this problem hard today.. took me a while to understand why the health check was restarting a pod that was previously working..
the problem is that the exporter works.. then it receive wrongly formatted data and the http endpoints then output http 500 status code with errors....
I may understand the rationale to fail hard instead of just being resilient (at least in a generic case, but this is an app that collect data! should be able to handle some wrong input without crashing..) and signal the issue to the app mantainer....

but a better approach in my opinion would be to use the log to log the wrongly formatted ( and discarded metrics )

let me explain:
this application receive thousands of metrics, from tens of servers ( in my case at least ) and if someone somewhere sometime send something wrong the whole application return error 500, without any metrics printed , as collateral this triggered the pod livenessProbe ( which now I've replaced with only a tcp socket ) very weird behaviour for an application which is live but in incompatible state with liveness httpGet probe of kubernetes defaults ( which threats error 500 as failure of the whole application )

(
if you intention is to not restart the whole application when such error occurrs ( which also doesn't depend on this application but on external input data ) this means that this application is incompatible with httpGet livenessProbe which is counter-intuitive behaviour.. being an http endpoint..
)

.. ( wrong metrics could be outputted as comments at least.. ) and also will return 0 data (of some thousands metrics which are fine ) to the scraper..

In my opinion your approach it's like a database crash when receive a wrong INPUT query

my thoughts are that you should just output a ERROR in app logs which maybe can be downgraded to a WARN with a command flag ( something like --wrong-input-warn )

thanks,
Francesco

@fmoessbauer
Copy link

fmoessbauer commented Sep 10, 2018

@fvigotti: I totally agree with your statement. Ill-formed input from a single client should definitely not result in a total crash. Otherwise this exporter is simply not suited for large-scale deployments.

Conceptually spoken the incoming data has to be validated by the ingress API and only valid metrics should be passed to later processing steps. The comparison with the database perfectly illustrates this.

@hasso
Copy link
Contributor

hasso commented Sep 12, 2018

While I agree that there shouldn't be a way to DOS collectd exporter like this and this needs to be fixed in both collectd and collectd_exporter, I'll document a workaround to add into your collectd.conf for those in trouble for now.

LoadPlugin match_regex
LoadPlugin target_set

<Chain "PreCache">
  <Rule "CONNTRACK-FIX-TYPE">
    <Match "regex">
      Plugin                    "^conntrack$"
      TypeInstance              "^$"
    </Match>
    <Target "set">
      TypeInstance              "current"
    </Target>
  </Rule>

  <Target "write">
  </Target>
</Chain>

@beorn7
Copy link
Member

beorn7 commented Sep 13, 2018

Note prometheus/client_golang#417 . In short: After much back and forth, it was decided that inconsistent label dimensions are actually fine.
(The library will still only expose valid metric data. Just that "missing labels" are now valid.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants