Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some logging fixes #229

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Some logging fixes #229

wants to merge 4 commits into from

Conversation

mboogerd
Copy link

These changes have fixed the following two issues for me:

  1. Minecraft 1.7 recreates the latest.log file on server start, which makes tail fail. Tail appears to be still reading, but does not receive any of the new lines written to the new file with the same name. tail -F fixes that.
  2. Logging format has changed since the release of 1.7. Pre-1.7 would prefix log-entries with date and time. Now it is prefixed with a time-only-stamp between square brackets. This has broken the methods server_log_dots_for_lines and server_log_get_line, as well as log_line_get_time on which the former two depend. I have added an additional regex to identify the new format and deal with it appropriately.

making tail'ing the log just hang without new output.
before extracting the time. Note that the new logging format only has a
timestamp without date component, which may cause trouble!
They now return immediately on encountering the pattern (param $3)
regardless of whether log-format is pre-1.7 or 1.7, rather than waiting
until the timeout occurs.
@zachlatta
Copy link
Contributor

Before this is merged, can you add test cases, as specified on line 307?

@mboogerd
Copy link
Author

Just noticed that I forgot to respond to zachlatta. I have not built the requested testcases, nor do they seem absolutely necessary. The versions that do not include a date-component in their timestamping have a daily logroll. This is somewhat subject to race conditions, but the chance of this leading to an actual error are very slim. I have this running in production for a small month without issues, other than early bukkit 1.7 incompatibility. It seems they recently changed their logging timestamping to match the to the one vanilla uses. I will try testing this a little more with bukkit tomorrow (just discovered this works now) and update this pull request accordingly

@mboogerd
Copy link
Author

I have been playing around with the latest bukkit for a while and most of the critical functionality seems to work fine. Starting/stopping, saving, backing up is not an issue. Starting also produces dots again and returns as soon as the server is started, rather than timing out, both for bukkit and vanilla (which is important for those using xinetd to start servers on demand). I don't expect difficulties with any of the other commands. I haven't changed msm logroll functionality, so that still has undesirable results i.c.w. the 1.7 logrolling. The one thing that is an annoyance is the 'connected' command, which I have fixed for (pre) 1.7, but which fails in bukkit 1.7. The reason for that is that bukkit doesn't prefix all loglines with a timestamp:

Bukkit 1.7
[11:39:24] [Server thread/INFO]: There are 1/20 players online:
beyonix^[[m
Vanilla 1.7
[11:14:03] [Server thread/INFO]: There are 1/20 players online:
[11:14:03] [Server thread/INFO]: beyonix

The best way imo to fix that is to add a regex to the log4j PatternLayout configuration in bukkit to replace newlines with newline+timestamp. Don't know whether the bukkit guys will accept such a suggestion, but I will look into it if I find some time. For now I believe this is an acceptable improvement over the old msm for 1.7 compatibility.

@zachlatta
Copy link
Contributor

Alright. This is definitely an improvement, but we've let tests slip recently. I'd really appreciate it if you could test this code for regressions despite it feeling not necessary.

@jives
Copy link
Contributor

jives commented Dec 27, 2013

Would it be possible to use msm's versioning to handle the regex changes needed for 1.7 (minecraft/bukkit)?
I feel that would be a better approach than extending the main msm functions with cases for specific versions.
I already wanted to start implementing and testing such an approach, but thought it may be easier to discuss it with you guys first and maybe include it in this pull request already :)

If needed, the version files could be extended with functions to set server (environment) variables, maybe including the log locations to handle the new directory/logfile structure in 1.7.

Alright. This is definitely an improvement, but we've let tests slip recently. I'd really appreciate it if you could test this code for regressions despite it feeling not necessary.

The race conditions mentioned definitely exist and may lead to wrong output depending on the msm command. If the log rolls over just before we start reading from it, it'll look like the log entries are a day too old. I think we could somewhat mitigate this issue by reading the date via mtime of the logfile we're about to open and use that to compare timestamps, but I think there will always be an unhandled case which can lead to undesired behaviour for this one single command that was called when the file rolled over. I also think the only way to solve this issue is for the minecraft/bukkit guys to put back the date into the log file timestamp.

@mboogerd
Copy link
Author

Would it be possible to use msm's versioning to handle the regex changes needed for 1.7

It's something I would have liked to do since it fits/extends the current approach to attack these versioning problems, but I felt priority should be given to working away the differences between bukkit and vanilla logging (bukkit's multiline logentries only having their first line timestamped). If it turns out we can eliminate the version - and release (bukkit/vanilla/spigot) differences by just looking at different static configurations, I feel that's certainly the way to go.

With regard to that problem:
I'm not sure whether PatternLayout's regex-replacement will suffice since you have no access to the conversion patterns in the replacement regex, only in the search regex. If anyone more experienced with regexes wants to give that a shot: bukkit's logging configuration is in the root of the jar (log4j2.xml) and the documentation for the configuration format can be found here: http://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout.

An alternative approach may be to store the latest found timestamp, use that for lines that don't have one and update it if you find a new one? That seems to patch it up, except for the case that the first line tail reads is a bottom part of a multiline log-entry, thus possibly missing the timestamp. Getting the log format exactly the same is certainly preferable.

I think we could somewhat mitigate this issue by reading the date via mtime of the logfile we're about to open and use that to compare timestamps, but I think there will always be an unhandled case...

Exactly, I haven't been able to push myself to code something that provides no guarantees (such as mtime) and makes log-content-analysis code (deriving timestamp from logline) dependent on the actual logfile -> ugly. I agree that adding a date-component to the timestamp is the only reliable fix, but I don't think we should be expecting that. The only way we're going to fix this, is to put in the ugly work I guess. Here's my proposition for a way of dealing with the mess:

If the log-entry has no date component, assume a daily logroll, which ensures that log-entries in a file can only belong to one possible day. However, establishing what day that is, is subject to race conditions. Given that we do the date check after receiving a log-entry, date+time would either be in the past or in the future. The latter possibility we can check and eliminate by subtracting a day. The former possibility seems correct, given that logroll is executed.

I believe that should settle the date of logentries for a running server (correct me if not). For a server in a stopped state, in particular one that we want to start, there is the additional possibility of reading in lines from even earlier, since the logroll may not have taken place. We may instead verify whether the logroll has taken place by waiting until the logs modification date is of today, before we start tailing it. Since increasing the delay until tailing increases the (already existing) risk of missing new log entries when the server is logging a lot, I think that really should be an explicit strategy only for the commands that require it (just server start I guess?), rather than a general one.

Concluding
This has become a rather large post, but since the log analysis is such a big part of msm, getting the logic right for old, current and hopefully future versions is an important issue. I think my proposition might cover all cases where (new) race conditions lead to actual problems, but I'm not certain. It would be great if some people more experienced with msm and/or bash (I have very little of both) could verify its completeness/validity. Perhaps we can also identify a set of clearly separable tasks, because with my bash inexperience, doing this alone is going to take a while.

@AnthonyAMC
Copy link

I just have to ask. In which case is MSM in a worse state? Would MSM be in a worse condition if this PR was merged then it is now, or does this PR improve (albeit not perfectly) the state of MSM?

@kjkazinski
Copy link
Contributor

I'm new to using git. How do I pull and merge this into the cloned repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants