Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage usermod #4342

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Usage usermod #4342

wants to merge 5 commits into from

Conversation

netmindz
Copy link
Collaborator

@netmindz netmindz commented Dec 3, 2024

Track anonymised usage data for WLED.

The aim is to be able to answer the following questions

  1. How many devices are running WLED
  2. Which ESP chips are in use
  3. How many LEDs are people using per-controller
  4. What version of WLED are users running
  5. How stable is that version (track the uptime to spot trends in lower uptime caused by higher rate of crashes)

This will need to be an opt-in feature, easier to do at part of the onboarding for fresh installs than for upgrades, so we will need to put lots of messages out asking for users to enable so we can better support them and prioritise feature development.

The backend server is open-source so that we can provide complete transparency as to what data we capture and how we use it https://github.com/netmindz/WLED_usage

@netmindz
Copy link
Collaborator Author

Note to self - we should also send flash size and perhaps partition info given the current challenges regarding bin size, especially with V4 and also the issues with users installing WLED via Tasmota or non-standard partitioning

@DedeHai DedeHai marked this pull request as ready for review December 11, 2024 09:46
@willmmiles
Copy link
Collaborator

Note to self - we should also send flash size and perhaps partition info given the current challenges regarding bin size, especially with V4 and also the issues with users installing WLED via Tasmota or non-standard partitioning

Following on with this: for telemetry builds, we might want to think about explicit crash reporting. I've got some code for ESP8266 to write crash dumps to the flash, where it could be uploaded or posted later; and I'm given to believe that ESP32s will automatically do so if there's a crash dump partition left for them.

@willmmiles
Copy link
Collaborator

From a technical design review standpoint:

+1 for the hash over the MAC address as device ID. Unique but not reversible.

I'm not thrilled about a statically allocated packet 100s of bytes in length -- that's a lot of RAM waste on some of our more constrained platforms (8266, S2); especially since the contents themselves are already statically available elsewhere. I'd prefer to do packet assembly on the stack when needed rather than permanently waste RAM.

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback. The whole transaction can be rigged up as a self-destructing callback chain using AsyncTCP. If it goes, it goes; if it doesn't, it doesn't; but after that there's no ongoing cost beyond a used-up pointer and an empty loop() callback. (For a few extra bytes of code memory, it could even be made a proper HTTP POST, so we could use off-the-shelf CRUD db code; our client code need not care about the reply.)

On the service side: where does the database live? Who has the keys (to the data, to the service management), and how are they passed along to the team/made available publicly? Do we need to think about DOS attacks or flood prevention? Particularly with a periodic send approach, scalability quickly becomes a concern - do we need to think about handling 100k live devices? (Should we be so lucky?)

@netmindz netmindz marked this pull request as draft December 12, 2024 10:25
@netmindz
Copy link
Collaborator Author

From a technical design review standpoint:

+1 for the hash over the MAC address as device ID. Unique but not reversible.

Thank you

I'm not thrilled about a statically allocated packet 100s of bytes in length

That is just my inexperience with C, it sounds like an easy thing to fix

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback.

In order to be able to see build stability, we do need more than a one-time only call as we need the uptime, the exact frequency TBD

On the service side: where does the database live? Who has the keys (to the data, to the service management), and how are they passed along to the team/made available publicly?

From a cost perspective, the easiest is to point this at a VM on my own dedicated server that the team is all given access to. Happy to discuss other possible options

Do we need to think about DOS attacks or flood prevention? Particularly with a periodic send approach, scalability quickly becomes a concern - do we need to think about handling 100k live devices? (Should we be so lucky?)

Even at 100k devices sending one message an hour, say, that is still very little bandwidth, and we can play around with different storage models for the data. We can lean heavily into the fact that we can accept failure. If we miss an update from a device — so what, we don't care. Nobody is going to care if a specific update gets lost. There is no expectation that this will give us 100% visibility. This is another good reason to be using UDP not TCP, we avoid all the extra overhead of needing to establish a connection, threading issues relating to handling those connections, nio etc; We just see a stream of packets

@willmmiles
Copy link
Collaborator

willmmiles commented Dec 12, 2024

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback.
In order to be able to see build stability, we do need more than a one-time only call as we need the uptime, the exact frequency TBD

Do we care about uptime in general, or uptime of crashes? If we only care about crashes, then we only need to report once at boot with the uptime from before the last crash. We can store and read back the uptime from before the crash locally on the device in memory that is only cleared on power-cycling. ("RTC memory" is one such space, though I found on ESP8266 you could use pretty much any statically allocated variable if you ask the linker nicely to leave it alone; haven't tried ESP32 yet.)

All the other concerns are contingent on single update vs continuous update implementation.

From a cost perspective, the easiest is to point this at a VM on my own dedicated server that the team is all given access to. Happy to discuss other possible options

No objections to the physical arrangement, though IANAL and I can't speak for any potential legal ramifications. Mostly I wanted to pin down how the team is given access. Who has the authority to add another developer to the access list? How is that to be managed? (Ask politely on discord is a reasonable answer, but I do think it should be documented somewhere.)

@netmindz netmindz changed the base branch from 0_15 to main December 16, 2024 13:24
@netmindz
Copy link
Collaborator Author

I also think it would be better to track posting success, rather than periodically spray packets on to the internet -- I think we should aim to minimize the static runtime cost. I'd be inclined to suggest using a TCP connection instead of UDP; construct and send only once, out of the connected() callback.
In order to be able to see build stability, we do need more than a one-time only call as we need the uptime, the exact frequency TBD

Do we care about uptime in general, or uptime of crashes?

I was thinking mostly about detection of crashes. If you have a technique to actual get the exact runtime before the last crash that is definitely better than periodic updates and estimating how long the device ran for before we see the uptime go to a lower figure. It will remove the need to track the per device current uptime on the server side

All the other concerns are contingent on single update vs continuous update implementation.

If we can get build stability info without periodic updates then that greatly removes the need for them, certainly the frequency of them dramatically.

I would still be interested to know how many of these devices are actually in use - e.g if users haven't updated is this because they use WLED only for the holiday season so we should discount devices not seen in the last 60 days for example.

Originally I'd even been wondering what's the minimum retention period we can use for this data. I.e say only store the per device data for 7 days and only store the aggregated stats for longer

From a cost perspective, the easiest is to point this at a VM on my own dedicated server that the team is all given access to. Happy to discuss other possible options

No objections to the physical arrangement, though IANAL and I can't speak for any potential legal ramifications. Mostly I wanted to pin down how the team is given access. Who has the authority to add another developer to the access list? How is that to be managed? (Ask politely on discord is a reasonable answer, but I do think it should be documented somewhere.)

My preference would be for the reporting dashboard to be publicly accessible, provided this didn't have present any privacy issues. Developer access to the VM was more about providing visibility to confirm that the open source server code was actually what was deployed on that server rather than based purely on my word

@willmmiles
Copy link
Collaborator

Do we care about uptime in general, or uptime of crashes?

I was thinking mostly about detection of crashes. If you have a technique to actual get the exact runtime before the last crash that is definitely better than periodic updates and estimating how long the device ran for before we see the uptime go to a lower figure. It will remove the need to track the per device current uptime on the server side

For uptime-before-crash, the mechanism is to declare a variable with __attribute__((section(".noinit"))); this instructs the system to leave the old RAM contents alone on startup, so it preserves the value from the previous boot. We set it to millis() in loop(), and read it back in setup().

For more detailed crash info, on ESP32 we can call on esp_reset_reason() for a basic "was this a crash"; and we should arrange that all of our standard flash layouts include a core dump partition so we can supply a way for users to collect and upload crash details. For ESP8266 I've developed a library for writing crash dumps and cores to the flash filesystem that serves the same purpose.

All the other concerns are contingent on single update vs continuous update implementation.

If we can get build stability info without periodic updates then that greatly removes the need for them, certainly the frequency of them dramatically.

I would still be interested to know how many of these devices are actually in use - e.g if users haven't updated is this because they use WLED only for the holiday season so we should discount devices not seen in the last 60 days for example.

Yes. I think it's reasonable to support sending a "first connected to the internet" message for usage survey purposes even if the reset reason was "power on". That is, I think there should be two toggles:

  • Report crashes
  • Usage survey

..that basically cash out as conditionals on the reset reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants