Replies: 3 comments
-
|
Hi @jdlegan, thanks for sharing your feedback, much appreciated. Mitgation (2) does exist today, there's an automatic ingestion cut-off when low disk space is detected. But, the threshold is quite small, and it sounds like this doesn't kick in early enough to leave a workable amount of free space with the kinds of data volumes you're dealing with. The setting is in Settings > System > Minimum free storage space, and the (old?) default is a hundred MB or so, I think: This will definitely be too little for loaded systems, on TB+ sized disks I'd probably aim for 100 GB or so, here. Let me know if this helps, |
Beta Was this translation helpful? Give feedback.
-
|
@nblumhardt is there any movement on this issue? In my evaluation I've run into this exact same issue, and its really one of the only things that's kept me from working to get Seq into our production environment. As @jdlegan suggested, some kind of disk based retention policy seems (at least from the outside) like quite a simple solution to this issue, and if its an optional policy users can choose if its right for them or not. |
Beta Was this translation helpful? Give feedback.
-
|
@adam8797 We currently have two features to address this issue:
A retention policy based on free storage space could create stability problems. When IO is busy many systems will fail to calculate free space. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Having used Seq in a production setting going on 9 years now there has always been one problem which creeps up now and again. This mostly happens in the non-production envs like UAT or sandbox where, due to the intentional verbosity of the logs outside of production, the data disks can fill rapidly. Rapidly in this case is measured in days/weeks since the disks in question are, in some cases, 4TB in size. There are a few issues and a game of cat and mouse which have caused us headaches over the years, which occurred again this AM, and I want to throw it out to the team to see if we are just missing something or something we could be doing better.
Issue: In SEQ environments where log verbosity and message volume is high, the datadisk can become full.
Current Solution: Create a retention policy for (seemingly) a reasonable period of useful time (i.e. 14 days). Closely monitor the disk free space vs the data retention period and once confirmed that the desired 14 days will safely fit on the disk, allow the retention policy to continue to trim it... until of course the volume of messages increase or the verbosity of certain message categories needs to change due to an investigation, at which point the amount of days which would fit into that disk shrinks, and if no one catches it (i.e. Alert from Azue due to disk free space reaching a certain threshold), allowing the process of finding a new safe "retention" period to begin, the disk will become full and new events will not be logged. To further complicate matters, when this happens, background deletion does not seem to work, even manual retention until you do one of several things:
You can try to stop and start the instance, sometimes this will allow you to kick off a manual delete, though it often does not work.
You can disconnect the disk from the VM, increase its size (i.e. 3.5TB -> 3.75TB), reattach it and start the instance again. This will allow manual deletion to start and allow new events to come in.
What we struggle with, and I suspect others do as well unless there is functionality we are not using correctly is that there is no:
Retention policy based on percentage or size of free space (this seems like low hanging fruit, though I can see why it would not work in all situations).
An automatic cut off of ingestion before the disk physically runs out of space, not relying on the OS to fail to write but rather being aware of the available space and acting responsibly so that that disk resizing is not necessary to successfully clean up events to free up space.
That there does not appear to be a way to receive an alert when ingestion ceases or that disk free space is reaching a certain threshold so that action can be taken prior to the log ingestion ceasing, leading to the loss of logs after the buffer is reached.
Is there some guidance that can be provided here and if the limitations are accurate, are there any plans to address them in a future update?
Beta Was this translation helpful? Give feedback.
All reactions