Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV: generate only en_us locales #855

Merged
merged 5 commits into from
Oct 15, 2024
Merged

DEV: generate only en_us locales #855

merged 5 commits into from
Oct 15, 2024

Conversation

featheredtoast
Copy link
Member

@featheredtoast featheredtoast commented Aug 31, 2024

Wondering if we really need all locales installed in the base image here.

Installing the pre-built locales-all package via apt takes 245MB
Generating just c and en_us locales takes 22MB

@xfalcox I know it's been a while since the move to debian, but do you remember if this is necessary? One guess I have is the original call to locale-gen was being called a few layers later, with no locale.gen configuration file -- I'm assuming locale-gen en_US used to work, but it doesn't appear to actually do anything in debian given debian's default /etc/locale.gen is empty

@featheredtoast featheredtoast force-pushed the generate-locales branch 2 times, most recently from 30fec38 to 337728b Compare August 31, 2024 19:17
@featheredtoast
Copy link
Member Author

Also other observation: passing locale-gen $LANG didn't seem to be doing a whole lot. All debian documentation I can find say locale-gen depends on updating /etc/locale.gen. Added a sed to help uncomment if $LANG is set to non-en locales for postgres templates.

install locales-all installs 245MB
Generating common and en_us locales installs 22MB

update locales for postgres templates

`locale-gen $LANG` doesn't seem to actually do anything without updates to
`/etc/locale.gen`. Update the scripts to uncomment $LANG before running `update-locale`.
@featheredtoast
Copy link
Member Author

@tgxworld let's start here with this for image size reduction. See the above comment for why I think we decided to originally pull locale-all.

ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ADD locale.gen /etc/locale.gen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should maintain a giant list of locales since we need to keep what is uncommented in the list in-sync with the LC_ALL, LANG and LANGUAGE ENVs. I think we should just do something like echo 'en_US.UTF-8 UTF-8' >> /etc/locale.gen which is what the official postgres docker image do.

# this file, you need to rerun locale-gen.


C.UTF-8 UTF-8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to generate this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably remove this too

Comment on lines 163 to 164
en_US ISO-8859-1
en_US.ISO-8859-15 ISO-8859-15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to generate these too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reached for all en_US languages - they're tiny in comparison to some of the larger, lesser-used languages but I'm fine with removing these too.

@tgxworld
Copy link
Contributor

@featheredtoast My own research into agrees with your observation as well. I don't actually think we need to install all the generated locales in the base image. LANG seems to be only required for postgres based on what I saw in the official Postgres base image.

@tgxworld tgxworld closed this Oct 14, 2024
@tgxworld tgxworld reopened this Oct 14, 2024
@featheredtoast
Copy link
Member Author

featheredtoast commented Oct 14, 2024

Sweet, I've updated the PR to only add en_US.UTF-8 UTF8 entry and rely on the default file's commented locales to get uncommented in the postgres templates

This prevents double entries for en_US if LANG remains en_US.UTF-8 if it's
uncommented again via postgres template

multiple entries probably aren't harmful, but I'd prefer the file to be clean.
Copy link
Contributor

@tgxworld tgxworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this out 👍 When you merge, do try out the new image on a couple of clusters to catch any potential problems before we do a wider roll out by bumping launcher :)

@featheredtoast featheredtoast merged commit a1d8d0b into main Oct 15, 2024
5 checks passed
@featheredtoast featheredtoast deleted the generate-locales branch October 15, 2024 06:19
@featheredtoast
Copy link
Member Author

Just so the compressed savings marker doesn't get lost, final image size reduction on amd64:
Before: 749.39 MB
After: 686.62 MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants