-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: set up git lfs for larger files #27923
Conversation
9e1fb3e
to
2607939
Compare
The only thing I'd ask for here is updates to the docs, particularly around testing/contrinbuting to the docs. If the process to check out and run the docs with images/video is different, we just need those steps documented. Also, we should soon be re-jiggering DB logos, and adding more release notes, so if the process for adding those files differs from what we've done traditionally, we'll want to document that and perhaps add CI safeguards. |
RELEASING/**/*.jpg filter=lfs diff=lfs merge=lfs -text | ||
RELEASING/**/*.png filter=lfs diff=lfs merge=lfs -text | ||
RELEASING/**/*.gif filter=lfs diff=lfs merge=lfs -text | ||
docs/static/img/**/*.jpg filter=lfs diff=lfs merge=lfs -text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I was just going to ask this supports wildcards. Not sure we even need the file extensions here, as people might add svg, webp, mov, mp4, who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I could be less specific here and include the docs/static/img/*
. For RELEASING I could probably find a better glob too, but it feels like this works well enough. Goal is to get the bulk of the binaries out.
I terms of doing it for the whole repo, it can create confusion for people who don't have git-lfs installed, and it can become hard to define which process in CI need or don't need which part of the lfs. For now I was able to switch off lfs downloads for everything EXCEPT docs building and docker building (because of the geojson, we do want that when building dockers).
This is the part that worries me... I assume most people don't? |
I recently realized that even shallow clones of this repo are pretty large, with a fair amount of that coming from video and image files. Here I setup the large video in our README to be stored in git lfs as well as image files for the docs. This change should be 100% seemless for anyone who has git-lfs installed and shouldn't affect common workflows. Also, to reap the benefits in CI, I'm setting up the checkout actions to more systematically not bring those files as they aren't required for most CI workloads, with the exception of building docs. Given the load on CI and how much we checkout the repo there, this should have a significant impact on speed and bandwidth of CI. I estimated about 1/2 the size from this handful of files Note that this change only affects the repo moving forward, meaning only shallow clones will see a benefit. There's a possibility of doing this retroactively, but it requires modifying git history and I would not recommend that.
fd72592
to
4d65833
Compare
oh wow!
GPT-generated benchmark script #!/bin/bash
# Repository URL
REPO_URL="https://github.com/your-repo.git"
# Prevent Git LFS from fetching files during clone
export GIT_LFS_SKIP_SMUDGE=1
# Function to clone and measure repo size
clone_and_measure() {
local branch=$1
local dir_name="${branch}_clone"
echo "Cloning $branch branch..."
time git clone --depth 1 --branch $branch $REPO_URL $dir_name
echo "Calculating .git size for $branch branch..."
du -sh $dir_name/.git
echo "Calculating repository size excluding .git for $branch branch..."
# Using find to list all files and directories except .git, then calculating size with du
find $dir_name -type d -name .git -prune -o -print | du -ch | tail -n 1
}
# Clone and measure master branch
clone_and_measure "master"
# Clone and measure log_queries branch
clone_and_measure "log_queries"
|
Just noticed the geojson files are now moved to git-lfs, that means you need to have/use git-lfs to run Superset with maps now. This really ought to get covered in all the relevant points in documentation before we merge it (how to pull/run things, and how to contribute/commit things as well). |
3ff6ab5
to
b60a5f6
Compare
Ok this is a bit disruptive, decided to go with this approach instead -> #27998 |
SUMMARY
[this is DRAFT but is a bit of a request for comment]
I recently realized that even shallow clones of this repo are pretty large, with a fair amount of that coming from video and image files.
Here I setup the large video in our README to be stored in git lfs as well as image files for the docs.
This change should be 100% seamless for anyone who has git-lfs installed (and GitHub Actions have it on by default, meaning a normal clone/checkout operation will fetch the lfs assets) and shouldn't affect common workflows.
What I did to setup this PR:
Also, to reap the benefits in CI, I'm setting up all of the
actions/checkout
(except for the docs-building ones) to systematically not bring those files as they aren't required for most CI workloads, with the exception of building docs.Given the load on CI and how much we checkout the repo there, this should have a significant impact on speed and bandwidth of CI. I estimated about 1/2 the size from this handful of files
Note that this change only affects the repo moving forward, meaning only shallow clones will see a benefit. Also note that by default all
action/checkout
are shallow clones by default. Also note that there is a possibility of doing this retroactively, but it requires modifying git history (possibly affecting all shas/refs) and I would not recommend that at this time.Testing