common: fix unclosed logrus logging pipes #4728

lzap · 2025-05-22T12:00:32Z

It looks like both CloudAPI and WeldrAPI consume memory, process can go up to several gigabytes pretty quickly just by running a simple script. This was reported few days ago: #4726

while sleep 0.001; do
    curl --unix-socket /run/weldr/api.socket -XGET http://localhost/api/
    curl --unix-socket /run/cloudapi/api.socket -XGET http://localhost/api/
done

I was able to quickly reproduce on Fedora 42, extracting core with gcore PID and running dlv core /usr/libexec/osbuild-composer/osbuild-composer core.PID revealed that there are tens of thousands of blocked goroutines all with the following stack:

(dlv) goroutine 70207
Switched from 67162 to 70207 (thread 885812)

(dlv) stack
0  0x00005607664f026e in runtime.gopark
   at /usr/lib/golang/src/runtime/proc.go:436
1  0x00005607664cf2b7 in runtime.selectgo
   at /usr/lib/golang/src/runtime/select.go:351
2  0x000056076653ec45 in io.(*pipe).read
   at /usr/lib/golang/src/io/pipe.go:57
3  0x000056076653f35a in io.(*PipeReader).Read
   at /usr/lib/golang/src/io/pipe.go:134
4  0x0000560766712c1e in bufio.(*Scanner).Scan
   at /usr/lib/golang/src/bufio/scan.go:219
5  0x00005607669576fd in github.com/sirupsen/logrus.(*Entry).writerScanner
   at /usr/src/debug/osbuild-composer-142-1.20250522git99f5fec.fc42.x86_64/vendor/github.com/sirupsen/logrus/writer.go:86
6  0x00005607669575a8 in github.com/sirupsen/logrus.(*Entry).WriterLevel.gowrap1
   at /usr/src/debug/osbuild-composer-142-1.20250522git99f5fec.fc42.x86_64/vendor/github.com/sirupsen/logrus/writer.go:57
7  0x00005607664f80a1 in runtime.goexit
   at /usr/lib/golang/src/runtime/asm_amd64.s:1700

There is a logrus logger method called Write and WriteLevel which create a new logging entry, create a PIPE and spawn a goroutine that is reading from that PIPE. The caller is expected to close the PIPE writer which was not happening in both main.go files.

But that was not the culprit of the problem, a new goroutine was created every single HTTP request. I tested both go-systemd library as well as httprouter library but they were scaling well and no problems there. Until I realized that both WeldrAPI and CloudAPI use a common logging proxy EchoLogrusLogger that had a Write method implemented as follows:

func (l *EchoLogrusLogger) Write(p []byte) (n int, err error) {
	return l.Logger.WithContext(l.Ctx).Writer().Write(p)
}

The Writer() call creates a new PIPE and a goroutine but is never closed, this method is used from both WeldrAPI (through the standard log package) as well as CloudAPI (through Echo logger) at least once per request. The obvious fix and the one we can use if we chose to backport this fix is:

func (l *EchoLogrusLogger) Write(p []byte) (n int, err error) {
    w := l.Logger.WithContext(l.Ctx).Writer()
    defer w.Close()
    return w.Write(p)
}

But this PR aims for better solution that prevents from spawning a new goroutine each Write call in the first place. Writer is created only once per logger, that is once for the WeldrAPI since it uses a singleton and once per request for CloudAPI, much better than every single event. Cheers!

thozza

Thanks for figuring this out. There is one (probably a typo or copy&paste) 🐛 in the PR...

thozza · 2025-05-22T12:20:38Z

internal/common/echo_logrus.go

+}
+
+func (l *EchoLogrusLogger) ctxLogger() *logrus.Entry {
+	return l.ctxLogger()


Should have been something like return l.logger.WithContext(l.Ctx)?

Good catch yeah I was doing a last minute improvements and I was planning on storing the logger in the instance directly with context, but logrus has such a weird API because WithContext does not return a logger but an event. I will remove this helper method, it does not make the code any more clear.

cmd/osbuild-composer/main.go

mvo5

A bit of a meta-question - AIUI this was an important issue but it looks like there is no test for the fix, is there a way to test for this?

lzap · 2025-05-22T13:33:37Z

A bit of a meta-question - AIUI this was an important issue but it looks like there is no test for the fix, is there a way to test for this?

Good point, I could add a test that just logs some data. The test would not test the fact we close the PIPE in all places tho. And I could do that if you ask me to, however, logrus is going away, this sprint I have a ticket to finish with logrus>slog refactoring. All logging code will be extracted into the common logging library which has extensive tests through and through.

https://github.com/osbuild/logging

In fact I already started the work.

lzap · 2025-05-22T13:55:08Z

For the record, I introduced the method in v111 (Spring 2024): 5ce8f65 and few weeks (v111) it was made effective by setting the default logger to this implementation: f3c0dae

bcl

Looks good to me!

thozza

I have one question, otherwise the PR LGTM.

thozza · 2025-05-26T07:00:21Z

internal/common/echo_logrus.go

-	return l.Logger.WithContext(l.Ctx).Writer().Write(p)
+	return l.writer.Write(p)


What is the retionale for this method to not use the context?

Good observation, Writer interface is only used for the global logger that is used in WeldrAPI. I am going to change all logging in composer to use slog that will properly handle context, this file will actually be deleted completely. Context was not in use in this case (was not set for the global logger).

mvo5 · 2025-05-26T09:24:21Z

A bit of a meta-question - AIUI this was an important issue but it looks like there is no test for the fix, is there a way to test for this?

Good point, I could add a test that just logs some data. The test would not test the fact we close the PIPE in all places tho. And I could do that if you ask me to, however, logrus is going away, this sprint I have a ticket to finish with logrus>slog refactoring. All logging code will be extracted into the common logging library which has extensive tests through and through.
[..]

That is fine, if this code goes away very soon obviously we don't need to do busywork here. Mentioning why no tests is provided/needed in the PR description (and ideally commit message(s)) would be good IMHO. I mean, I think our default assumption is that code changes have tests (especially if they fix visible regressions) so knowing why they are omitted is very useful information.

lzap · 2025-05-26T09:49:45Z

Expect the first logging refactoring this week, I will follow the CRC case and do it in two waves - first I will use a logrus proxy and remove this dependency completely from the project, next I will refactor all log calls to make use of the cleaner slog API.

lzap · 2025-05-26T12:27:39Z

@mvo5 could you give me green light here so I can start on the refactoring right away to avoid conflicts? thanks.

only rebased GH was down

lzap requested review from thozza and a team as code owners May 22, 2025 12:00

lzap requested review from mvo5 and supakeen and removed request for a team May 22, 2025 12:00

thozza reviewed May 22, 2025

View reviewed changes

mvo5 reviewed May 22, 2025

View reviewed changes

cmd/osbuild-composer/main.go Show resolved Hide resolved

mvo5 reviewed May 22, 2025

View reviewed changes

lzap force-pushed the logrusfix branch from b89bb5c to 9422ec2 Compare May 22, 2025 13:28

bcl approved these changes May 22, 2025

View reviewed changes

thozza reviewed May 26, 2025

View reviewed changes

thozza approved these changes May 26, 2025

View reviewed changes

lzap requested a review from mvo5 May 26, 2025 12:26

lzap removed the request for review from supakeen May 26, 2025 12:28

common: fix unclosed logrus logging pipes

b81df05

lzap force-pushed the logrusfix branch from 9422ec2 to b81df05 Compare May 26, 2025 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

common: fix unclosed logrus logging pipes #4728

common: fix unclosed logrus logging pipes #4728

Uh oh!

lzap commented May 22, 2025

Uh oh!

thozza left a comment

Uh oh!

thozza May 22, 2025

Uh oh!

lzap May 22, 2025

Uh oh!

Uh oh!

mvo5 left a comment

Uh oh!

lzap commented May 22, 2025

Uh oh!

lzap commented May 22, 2025

Uh oh!

bcl left a comment

Uh oh!

thozza left a comment

Uh oh!

thozza May 26, 2025

Uh oh!

lzap May 26, 2025

Uh oh!

mvo5 commented May 26, 2025

Uh oh!

lzap commented May 26, 2025

Uh oh!

lzap commented May 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

		return l.Logger.WithContext(l.Ctx).Writer().Write(p)
		return l.writer.Write(p)

common: fix unclosed logrus logging pipes #4728

Are you sure you want to change the base?

common: fix unclosed logrus logging pipes #4728

Uh oh!

Conversation

lzap commented May 22, 2025

Uh oh!

thozza left a comment

Choose a reason for hiding this comment

Uh oh!

thozza May 22, 2025

Choose a reason for hiding this comment

Uh oh!

lzap May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mvo5 left a comment

Choose a reason for hiding this comment

Uh oh!

lzap commented May 22, 2025

Uh oh!

lzap commented May 22, 2025

Uh oh!

bcl left a comment

Choose a reason for hiding this comment

Uh oh!

thozza left a comment

Choose a reason for hiding this comment

Uh oh!

thozza May 26, 2025

Choose a reason for hiding this comment

Uh oh!

lzap May 26, 2025

Choose a reason for hiding this comment

Uh oh!

mvo5 commented May 26, 2025

Uh oh!

lzap commented May 26, 2025

Uh oh!

lzap commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lzap commented May 26, 2025 •

edited

Loading