Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udp_queues_linux.go: Expose UDP drops via gauge analogous to queue sizes #2993

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

cleeland
Copy link
Contributor

Since a common reason for monitoring UDP queue sizes can be in teasing out performance and QoS issues, it would be convenient to have drops available in the same frames of data. Since the underlying source of the data provides that information, this PR exposes drops modeled as a gauge just as queue sizes are.

@discordianfish @SuperQ

@cleeland cleeland force-pushed the expose-udp-drops branch 2 times, most recently from 084bd27 to b297b22 Compare April 19, 2024 16:45
@cleeland
Copy link
Contributor Author

@discordianfish @SuperQ I'm assuming that the failing circleci tests are because I need to update the fixtures (via make update_fixtures)? I'm having problems doing that running natively on macos. Is updating fixtures something that only reliably works on linux? If so, I'll make that happen in docker or a VM.

@cleeland
Copy link
Contributor Author

I'm not sure why test_docker is failing, and I can't pull the base image to run the test manually. It complains that I am unauthorized. From what I can tell, quay.io does not have a no-cost tier.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, no, this seems incorrect. Drops appears to be a counter, so this needs to be a new metric, not a label on an existing metric.

@cleeland
Copy link
Contributor Author

Wait, no, this seems incorrect. Drops appears to be a counter, so this needs to be a new metric, not a label on an existing metric.

Drat; you're right. Let me see if I can fix this quickly.

@nicolas-laduguie
Copy link

Hello,
Is there any progress on this work ?
We would also need this new metric being exposed by the node exporter :)

@cleeland cleeland force-pushed the expose-udp-drops branch 2 times, most recently from ecf14b8 to 7d1e8c6 Compare January 6, 2025 15:55
cleeland and others added 4 commits January 6, 2025 09:58
Update e2e test output to include the drops.

Signed-off-by: Chris Cleeland <[email protected]>
Don't use a pointer without checking it first.

Signed-off-by: Chris Cleeland <[email protected]>
I think it makes sense for "drops" to sit alongside the queue length
gauges in prom stats because they are all neighbors in procfs (source
of these stats).

Moreover, in reading the commit log message for the original creating
work for udp_queues, I think there may have been some misreading or
confusion between the word "state" and the common short-form of
"stats" to mean "statistics".  The original author "chose the name
'udp_queue' instead of 'udpstat' as UDP has no state"; I believe that
'udpstat' might actually be the more appropriate name.

Signed-off-by: Chris Cleeland <[email protected]>
Signed-off-by: Chris Cleeland <[email protected]>
@cleeland
Copy link
Contributor Author

cleeland commented Jan 6, 2025

What I thought would be a couple of easy commits/pushes is turning into a bunch of work (maybe because I'm running on macos and not on linux?) In any case, I have to set this aside for awhile.

@nicolas-laduguie
Copy link

What I thought would be a couple of easy commits/pushes is turning into a bunch of work (maybe because I'm running on macos and not on linux?) In any case, I have to set this aside for awhile.

@cleeland can you summarize what you tried and what was the issue?
Would be helpful if someone can help you or if I open myself a pr to get that new metric in.

@cleeland
Copy link
Contributor Author

cleeland commented Jan 6, 2025

What I thought would be a couple of easy commits/pushes is turning into a bunch of work (maybe because I'm running on macos and not on linux?) In any case, I have to set this aside for awhile.

@cleeland can you summarize what you tried and what was the issue? Would be helpful if someone can help you or if I open myself a pr to get that new metric in.

The biggest impediment is that I can't run the makefiles on ventura 13.7.1. I thought I might still have a dockerfile or lima setup from when I initially did this PR, but I've not found it. I think once I get a chance to spin up lima with proper dev environment I'll be able to make progress.

@cleeland
Copy link
Contributor Author

cleeland commented Jan 7, 2025

@nicolas-laduguie I think I finally understand the issue I'm having: changing drops from a gauge to a counter requires that drops no longer be a label under udp_queues, and instead it has to be its own metric, e.g.,
node_udp_queues_drops_total
with labels of either v4 or v6.

To do this I think I'll have to change the creation in NewUDPqueuesCollector() to create a separate metric? This is an area I don't quite understand, so I'll likely get it wrong a bunch of times before I get it right. Happy to accept a little coaching.

I was able to use lima to set up two linux machines (amd64 and arm64) to use as runtime environments.

This separates the gauges related to udp_queues and the counter
related to the total number of drops, but keeps them under the same
namespace.

Signed-off-by: Chris Cleeland <[email protected]>
@cleeland
Copy link
Contributor Author

cleeland commented Jan 7, 2025

@nicolas-laduguie I got something that achieves my goal, though I'm not sure if it's necessarily the best choice. Feel free to critique and/or modify. Most importantly, things that are gauge-ish are gauges, and things that are counter-ish are counters, so @SuperQ 's original issue is addressed.

@@ -158,7 +158,7 @@ thermal | Exposes thermal statistics like `pmset -g therm`. | Darwin
thermal\_zone | Exposes thermal zone & cooling device statistics from `/sys/class/thermal`. | Linux
time | Exposes the current system time. | _any_
timex | Exposes selected adjtimex(2) system call stats. | Linux
udp_queues | Exposes UDP total lengths of the rx_queue and tx_queue from `/proc/net/udp` and `/proc/net/udp6`. | Linux
udp | Exposes UDP statistics from `/proc/net/udp` and `/proc/net/udp6`. | Linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it makes sense to rename this but:

  1. We should rename the file as well
  2. Technically its a breaking change, e.g people might enable/disable the collector explicitly so when upgrading it will refused to start

@SuperQ wdyt? maybe we should give registerCollector a list of aliases and have it print a deprecation warning if any of them are used?

Or to just get this merged, maybe just not rename it for now..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured this wasn't an ideal solution, but I recognize that I'm too ignorant to know what sorts of API-ish guarantees are in place. I opted for "make sure that the metric names stay the same", although since I changed the namespace from "udp_queues" to "udp" it's possible I didn't even achieve that.

FWIW, I chose this after looking at some of the other metrics, and settled on something similar to the pattern used by cpu.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi guys, I contributed this PR based on your work @cleeland .
As @discordianfish said, the existing exporter name udp_queues might be already used by people disabling that collector.
So it's worth to not change it otherwise it would be a breaking change and would need to wait a major release from node exporter.
If it's fine for you, let's work in my PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas-laduguie thanks! I left my thoughts over on the new PR. Thanks for continuing to help shepherd this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants