Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers occupied by larger repos #2411

Open
4 tasks
mfocko opened this issue Apr 29, 2024 · 3 comments
Open
4 tasks

Workers occupied by larger repos #2411

mfocko opened this issue Apr 29, 2024 · 3 comments
Assignees
Labels
area/general Related to whole service, not a specific part/integration. complexity/single-task Regular task, should be done within days. gain/high This brings a lot of value to (not strictly a lot of) users. impact/high This issue impacts multiple/lot of users. kind/bug Something isn't working.

Comments

@mfocko
Copy link
Member

mfocko commented Apr 29, 2024

Description

During the problems with our queue on Monday, it's been discovered that the last executed command in both of our long-running workers has been (or replace with different repository):

[2024-04-29 10:26:46,485: DEBUG/ForkPoolWorker-1] task.run_copr_build_handler[d62bca3b-d550-4666-a2c2-13443fe8f130] Popen(['git', 'clone', '-v', '--tags', '--', 'https://github.com/systemd/systemd.git', '/tmp/sandcastle'], cwd=/src, stdin=None, shell=False, universal_newlines=True)

Given the size of the systemd repository from the example and its presence in both of the workers, it is suspected that the clone of the large repository resulted in the queue being “choked” by cloning large repository in the workers.

Since this has been caught as part of the run_copr_build_handler, we do not need the full history, it will be cloned for the build (and potential user-specified actions) in Copr build environment anyways.

⚠️ WARNING ⚠️

We still need the full history for sync-release runs and upstream-koji-build. Though those could be postponed to their respective sandboxes in Sandcastle.

TODO

  • Clone repositories with --depth=1
    • NOT applicable to sync-release and uptream-koji-build
  • Clone full repositories just in Sandcastle
    • Involves additional clone in the Sandcastle (this could be probably included in the command handler in Packit)

Sizes of repository

Current command »218 MiB«

/tmp % git clone -v --tags -- https://github.com/systemd/systemd.git
Cloning into 'systemd'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (gzip 19402 to 9771 bytes)
remote: Enumerating objects: 519634, done.
remote: Counting objects: 100% (963/963), done.
remote: Compressing objects: 100% (570/570), done.
remote: Total 519634 (delta 512), reused 648 (delta 366), pack-reused 518671
Receiving objects: 100% (519634/519634), 218.39 MiB | 2.89 MiB/s, done.
Resolving deltas: 100% (407856/407856), done.

Only cloning the latest commit »16 MiB«

/tmp % git clone -v --tags --depth=1 -- https://github.com/systemd/systemd.git
Cloning into 'systemd'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (229 bytes)
remote: Enumerating objects: 6383, done.
remote: Counting objects: 100% (6383/6383), done.
remote: Compressing objects: 100% (5196/5196), done.
remote: Total 6383 (delta 1405), reused 2941 (delta 893), pack-reused 0
Receiving objects: 100% (6383/6383), 16.03 MiB | 11.69 MiB/s, done.
Resolving deltas: 100% (1405/1405), done.
@mfocko mfocko added kind/bug Something isn't working. complexity/single-task Regular task, should be done within days. impact/high This issue impacts multiple/lot of users. area/general Related to whole service, not a specific part/integration. gain/high This brings a lot of value to (not strictly a lot of) users. labels Apr 29, 2024
@nforro
Copy link
Member

nforro commented Apr 29, 2024

Since this has been caught as part of the run_copr_build_handler, we do not need the full history

We are cloning the repo there only to get the config, right? Then making a shallow clone makes complete sense, I just don't think passing --tags is necessary (it would pull only the tags pointing to the cloned commit anyway).

There is also another option, treeless clone:

$ git clone -v --tags --filter=tree:0 -- https://github.com/systemd/systemd.git
Cloning into 'systemd'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (gzip 19369 to 9522 bytes)
remote: Enumerating objects: 77751, done.
remote: Counting objects: 100% (183/183), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 77751 (delta 1), reused 86 (delta 0), pack-reused 77568
Receiving objects: 100% (77751/77751), 23.53 MiB | 19.67 MiB/s, done.
Resolving deltas: 100% (492/492), done.
remote: Enumerating objects: 467, done.
remote: Counting objects: 100% (185/185), done.
remote: Compressing objects: 100% (167/167), done.
remote: Total 467 (delta 7), reused 70 (delta 4), pack-reused 282
Receiving objects: 100% (467/467), 197.63 KiB | 1.69 MiB/s, done.
Resolving deltas: 100% (8/8), done.
remote: Enumerating objects: 5915, done.
remote: Counting objects: 100% (4053/4053), done.
remote: Compressing objects: 100% (3445/3445), done.
remote: Total 5915 (delta 1099), reused 708 (delta 605), pack-reused 1862
Receiving objects: 100% (5915/5915), 15.88 MiB | 11.78 MiB/s, done.
Resolving deltas: 100% (1380/1380), done.
Updating files: 100% (6130/6130), done.

That's about 39 MiB in total and it could in theory work in place of full clones.

@lachmanfrantisek
Copy link
Member

A config option with a default to clone just the last commit might be a good approach.

@lbarcziova lbarcziova self-assigned this May 29, 2024
@lbarcziova
Copy link
Member

lbarcziova commented May 30, 2024

We are cloning the repo there only to get the config, right?

That's actually not true, we are getting the config via API earlier. The cloning happens anytime LocalProject is initialised, so there is room for improvement as well (probably related to #1955, EDIT: also packit/packit#1581).

lbarcziova added a commit to lbarcziova/packit-service that referenced this issue Jun 24, 2024
With this change, for copr builds also try to not clone the repo.
Related to packit#2411
lbarcziova added a commit to lbarcziova/packit-service that referenced this issue Jun 26, 2024
With this change, for copr builds also try to not clone the repo.
Related to packit#2411
lbarcziova added a commit to lbarcziova/packit-service that referenced this issue Jun 26, 2024
With this change, for copr builds also try to not clone the repo.
Related to packit#2411
softwarefactory-project-zuul bot added a commit that referenced this issue Jun 26, 2024
Utilise LocalProjectBuilder for LP initalisation

With this change, for copr builds also try to not clone the repo.
Fixes #1955
Related to #2411
For now no release notes, I would like to see whether this will work on staging as intended.
RELEASE NOTES BEGIN
N/A
RELEASE NOTES END

Reviewed-by: František Lachman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/general Related to whole service, not a specific part/integration. complexity/single-task Regular task, should be done within days. gain/high This brings a lot of value to (not strictly a lot of) users. impact/high This issue impacts multiple/lot of users. kind/bug Something isn't working.
Projects
Status: in-review
Development

No branches or pull requests

4 participants