Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay/05core: Set systemd LogColor=false #1079

Open
wants to merge 2 commits into
base: testing-devel
Choose a base branch
from

Conversation

cgwalters
Copy link
Member

We deploy on servers. The ANSI colors codes in the console
output often end up in e.g. log files where they both
make the logs hard to read and they make log files
invalid UTF-8.

@cgwalters
Copy link
Member Author

xref openshift/release#19827

@bgilbert
Copy link
Contributor

The other approach is to have the log parsers understand ANSI color. GitHub Actions has gone that route, for example.

@cgwalters
Copy link
Member Author

The other approach is to have the log parsers understand ANSI color. GitHub Actions has gone that route, for example.

That'd be nice, but I think there are two major differences:

  • A lot of things consume this console text, not all of them under our control. The above link where the AWS official CLI crashes with a Python exception in this case is just one example. I've seen (and I'm sure we all have) sooo many examples of console text copy/pasted from who knows what (e.g. some sort of IPMI system, a libvirt serial console, etc.) into e.g. a GH issue or Bugzilla, etc. In contrast, GH actions for example can fix their single UI and the problem is solved - there's only one output processor.
  • I simply do not care about the colors at all and just want to maximize debuggability 😄

@cgwalters
Copy link
Member Author

For reference, Amazon Linux boots with systemd.log_colors=no on the kernel command line (presumably because they didn't know this is (IMO) more elegantly controlled via the initramfs). Also, bottlerocket-os/bottlerocket#836

@dustymabe
Copy link
Member

dustymabe commented Jun 29, 2021

Is the problem slurped console logs or is it the process running journalctl. If it's the latter then SYSTEMD_LOG_COLOR could be used by the aggregator instead?

I'd prefer not to differ from other Fedora editions here if we don't really need to.

@dustymabe
Copy link
Member

SYSTEMD_LOG_COLOR

which I guess is different from SYSTEMD_COLORS and only applies to console logs?

@cgwalters
Copy link
Member Author

I'd prefer not to differ from Fedora here if we don't really need to.

This is a good point. But note: I think this is a server-versus-desktop thing, not a CoreOS versus "baseline Fedora" thing - i.e. Fedora Server or Cloud would likely want this too. I think in some perhaps distant future I could imagine somehow sharing code like this across editions.

(OK well honestly, desktop systems should have silent or GUI boot with plymouth - IMO the systemd default intentionally mimics the original "Red Hat Linux" which tried the wrong approach of making boot pretty when we should have worked on making boot fast. So there's no really good argument for systemd log coloring IMO anymore, but we can start here and then propagate it elsewhere)

@cgwalters
Copy link
Member Author

(Let's avoid implicitly saying we're not Fedora by contrasting with "Fedora". Instead, e.g. "other Fedora editions" is a good phrase)

@cgwalters
Copy link
Member Author

Is the problem slurped console logs

This. journalctl already knows how to turn off colors if it's being output to a non-tty. But there's no real mechanism for systemd to know that e.g. AWS is writing the serial console to a log file. We could in theory make this platform specific, but that gets messy and I think we get maximal benefit if we do it for metal, and if we do it there we might as well do it across the board.

@dustymabe
Copy link
Member

I'd prefer not to differ from Fedora here if we don't really need to.

This is a good point. But note: I think this is a server-versus-desktop thing, not a CoreOS versus "baseline Fedora" thing - i.e. Fedora Server or Cloud would likely want this too. I think in some perhaps distant future I could imagine somehow sharing code like this across editions.

I agree. We need a way to share code across editions within similar categories.

(OK well honestly, desktop systems should have silent or GUI boot with plymouth - IMO the systemd default intentionally mimics the original "Red Hat Linux" which tried the wrong approach of making boot pretty when we should have worked on making boot fast. So there's no really good argument for systemd log coloring IMO anymore, but we can start here and then propagate it elsewhere)

I'm cool with starting here and propagating elsewhere, but I don't often see the "propagate elsewhere" happen. Which means we become more snowflake by the day. There are two ways to combat that:

  1. require the change to be implemented elsewhere (shared space) before it lands in FCOS
  2. require a plan to exist to be implemented elsewhere before it lands in FCOS

Without something in place it just become "itch scratched" and we're off to the next thing.

(Let's avoid implicitly saying we're not Fedora by contrasting with "Fedora". Instead, e.g. "other Fedora editions" is a good phrase)

👍 - I updated the comment to use that phrasing.

@cgwalters
Copy link
Member Author

I'm cool with starting here and propagating elsewhere, but I don't often see the "propagate elsewhere" happen.

Agree. It's tricky since there's just no easy way to share code right now. But...hmmm...maybe we could try to "roll up" PRs like this periodically into something like a Fedora Change? It clearly merits release notes of some form.

@bgilbert
Copy link
Contributor

Okay, this makes sense to me. If we're going this route, it seems like we should also:

  • Apply the same setting in the initramfs
  • Disable ANSI escape sequences in the GRUB menu, which are another significant source of console garbage

@cgwalters
Copy link
Member Author

OK I thought this was working in the initramfs, but it turns out that the way I was testing via cosa virt-install ends up using QXL+Spice, which somehow suppresses console coloring (?). If I fix the VM to drop the Spice display and switch graphics to VGA, that's much more accurately emulating what would happen on physical hardware.

We deploy on servers.  The ANSI colors codes in the console
output often end up in e.g. log files where they both
make the logs hard to read *and* they make log files
invalid UTF-8.

This file needs to be created in both the real root and
the initramfs, so extend our `99journald-conf` to be
`99systemd-conf`.
@cgwalters
Copy link
Member Author

cgwalters commented Jun 29, 2021

[ OK ] now updated to do both the initramfs and real root, but indeed there is a problem which is our invocation of journalctl that will need separate fixing.

Screenshot from 2021-06-29 17-48-48

Also suppress colors here since `journalctl` defaults to coloring
if it thinks it can, and the way we're explicitly forwarding
the journal bypasses the logic for systemd's default console
coloring that we inject via default config.

(Long term, I think it would make sense probably to move
 this "failed in the initramfs" handling into systemd itself and
 not be in custom shell script we carry)
@cgwalters
Copy link
Member Author

Fixed that:
Screenshot from 2021-06-29 18-11-25

@cgwalters
Copy link
Member Author

Disable ANSI escape sequences in the GRUB menu, which are another significant source of console garbage

Hmm, right. I will look at this. That said, this is an area where I'd argue we do want some platform specifics, because e.g. aws shouldn't have a GRUB menu at all.

@cgwalters
Copy link
Member Author

OK I verified that this works in AWS:

$ kola spawn -p aws --aws-region us-east-2 --aws-ami ami-02c5c5fa78c8de932

Then aws ec2 get-console-output --instance-id i-06a42bf768a98ff29 looked good; though the grub menu had already scrolled off.

@bgilbert
Copy link
Contributor

Hmm, right. I will look at this. That said, this is an area where I'd argue we do want some platform specifics, because e.g. aws shouldn't have a GRUB menu at all.

Right, that would tie in with coreos/fedora-coreos-tracker#110.

@dustymabe
Copy link
Member

dustymabe commented Jun 30, 2021

I'm still kind of torn on this. As a human who often (almost every weekday of my life) looks at serial console and VGA console (not just the logs but the actual console) for debugging FCOS the difference between the two screenshots is quite the regression (even though we're only getting the color because we're calling journalctl the wrong way). I do see the value, just wish there was a better way.

@cgwalters
Copy link
Member Author

cgwalters commented Jun 30, 2021

Agree, though I think there's two factors here:

  • It's just different, so will take a bit of time to get used to
  • The colors do a good job of highlighting the actual error (ignition config version) but we could fix that instead by changing dracut to omit all the "how to use the emergency shell" bits until you actually press Enter. Basically the closer we get to where the last line of output is the real error the better it is (regardless of colors).

@cgwalters
Copy link
Member Author

Though, one thing I should mention here is that I have a moderate red-green color blindness, which I am sure affects my opinion on the utility of colors - particularly the use of red here. As I understand it, the red appears much darker to me than it does to people with normal color vision. But it's not uncommon, and a standard in UI/UX design is to avoid requiring or over-using colors as a mechanism to distinguish between things.

@dustymabe
Copy link
Member

I'm cool with starting here and propagating elsewhere, but I don't often see the "propagate elsewhere" happen.

Agree. It's tricky since there's just no easy way to share code right now. But...hmmm...maybe we could try to "roll up" PRs like this periodically into something like a Fedora Change? It clearly merits release notes of some form.

maybe a "non-gui" fedora-release subpackage? then we'd roll up changes every fedora release as a change and discuss amongst the other editions (cloud, server, iot, etc) that would be affected. For example we'd queue up changes like this, have a discussion and then get them in the next fedora release (of course we can prematurely apply them assuming discussion goes well). We can also choose to adopt a change any way if it provides enough benefit that we warrant carrying the change for just us.

It would be really nice to have a plan for changes like this in the future. Otherwise over time this will be harder and harder to manage.

to be clear: I am not intending to block this change, just using it for a sidebar discussion

@cgwalters
Copy link
Member Author

Should we re-discuss this at the next IRC meeting?

@dustymabe
Copy link
Member

Should we re-discuss this at the next IRC meeting?

seems reasonable. Want to open a top level tracker issue and tag it with meeting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants