-
Notifications
You must be signed in to change notification settings - Fork 112
common: fix unclosed logrus logging pipes #4728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring this out. There is one (probably a typo or copy&paste) 🐛 in the PR...
internal/common/echo_logrus.go
Outdated
} | ||
|
||
func (l *EchoLogrusLogger) ctxLogger() *logrus.Entry { | ||
return l.ctxLogger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have been something like return l.logger.WithContext(l.Ctx)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch yeah I was doing a last minute improvements and I was planning on storing the logger in the instance directly with context, but logrus
has such a weird API because WithContext
does not return a logger but an event. I will remove this helper method, it does not make the code any more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a meta-question - AIUI this was an important issue but it looks like there is no test for the fix, is there a way to test for this?
Good point, I could add a test that just logs some data. The test would not test the fact we close the PIPE in all places tho. And I could do that if you ask me to, however, logrus is going away, this sprint I have a ticket to finish with logrus>slog refactoring. All logging code will be extracted into the common https://github.com/osbuild/logging In fact I already started the work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question, otherwise the PR LGTM.
return l.Logger.WithContext(l.Ctx).Writer().Write(p) | ||
return l.writer.Write(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the retionale for this method to not use the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation, Writer
interface is only used for the global logger that is used in WeldrAPI. I am going to change all logging in composer to use slog
that will properly handle context, this file will actually be deleted completely. Context was not in use in this case (was not set for the global logger).
That is fine, if this code goes away very soon obviously we don't need to do busywork here. Mentioning why no tests is provided/needed in the PR description (and ideally commit message(s)) would be good IMHO. I mean, I think our default assumption is that code changes have tests (especially if they fix visible regressions) so knowing why they are omitted is very useful information. |
Expect the first logging refactoring this week, I will follow the CRC case and do it in two waves - first I will use a logrus proxy and remove this dependency completely from the project, next I will refactor all log calls to make use of the cleaner |
@mvo5 could you give me green light here so I can start on the refactoring right away to avoid conflicts? thanks. only rebased GH was down |
It looks like both CloudAPI and WeldrAPI consume memory, process can go up to several gigabytes pretty quickly just by running a simple script. This was reported few days ago: #4726
I was able to quickly reproduce on Fedora 42, extracting core with
gcore PID
and runningdlv core /usr/libexec/osbuild-composer/osbuild-composer core.PID
revealed that there are tens of thousands of blocked goroutines all with the following stack:There is a
logrus
logger method calledWrite
andWriteLevel
which create a new logging entry, create a PIPE and spawn a goroutine that is reading from that PIPE. The caller is expected to close the PIPE writer which was not happening in bothmain.go
files.But that was not the culprit of the problem, a new goroutine was created every single HTTP request. I tested both
go-systemd
library as well ashttprouter
library but they were scaling well and no problems there. Until I realized that both WeldrAPI and CloudAPI use a common logging proxyEchoLogrusLogger
that had aWrite
method implemented as follows:The
Writer()
call creates a new PIPE and a goroutine but is never closed, this method is used from both WeldrAPI (through the standardlog
package) as well as CloudAPI (through Echo logger) at least once per request. The obvious fix and the one we can use if we chose to backport this fix is:But this PR aims for better solution that prevents from spawning a new goroutine each
Write
call in the first place. Writer is created only once per logger, that is once for the WeldrAPI since it uses a singleton and once per request for CloudAPI, much better than every single event. Cheers!