Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0185] Redistribute redistributable software #185

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Ekleog
Copy link
Member

@Ekleog Ekleog commented Dec 15, 2024

Co-authored-by: Matt McHenry <[email protected]>
rfcs/0185-redistribute-redistributable.md Show resolved Hide resolved
rfcs/0185-redistribute-redistributable.md Show resolved Hide resolved
rfcs/0185-redistribute-redistributable.md Outdated Show resolved Hide resolved
rfcs/0185-redistribute-redistributable.md Outdated Show resolved Hide resolved
rfcs/0185-redistribute-redistributable.md Outdated Show resolved Hide resolved
rfcs/0185-redistribute-redistributable.md Outdated Show resolved Hide resolved
@7c6f434c
Copy link
Member

I think a drawback for the plan as-is would be that some of the stuff is actually pretty large so space usage at Hydra/cache goes up.

A missing related piece of work here is to look at the reverse dependencies of unfree packages and to check if they are a good idea to build. For normal packages we have some «it's large but downloading is longer than the build» packages marked as hydraPlatforms=[]. For stuff like TPTP («minimally» unfree license, a build compiling a few executables, some gigabytes of passive data) the position on whether storage:build-time is worth is currently «who cares it's technically unfree».

@Ekleog Ekleog force-pushed the redistribute-redistributable branch from 87cb264 to 27241c3 Compare December 19, 2024 11:42
@Ekleog
Copy link
Member Author

Ekleog commented Dec 19, 2024

Good points! I just added them to the RFC. Also I'll incidentally say that TPTP seems to already be marked as hydraPlatforms = [], but it's still worth reviewing the changes.

@7c6f434c
Copy link
Member

I think that given the skew of unfree stuff towards large binaries, significantly increasing the evaluation time and somewhat increasing the storage growth are separate points to mention. (Probably we need to ask for feedback from the infrastructure team on all that at some point)

@ShamrockLee
Copy link

ShamrockLee commented Dec 19, 2024

significantly increasing the evaluation time

@7c6f434c Why would binary-based packages significantly increase the evaluation time? Nixpkgs requires packages to pass strict evaluation, which means that downloading would never occur during evaluation.

I haven't experiment it, but I guess that packages that requires a long evaluation time typically falls into the categories below:

  • have a large number of requisites (direct and indirect dependencies)
  • have custom overriding applied to dependent packge sets
  • read a lock file to produce a set of vendored packages
  • are made up from large auto-generated Nix expressions (usually by those *2nix commands line tool)

none of the above are specific to unfree or binary-based packages.

Update: Some binary-based packages might be built with legacy versions of libraries, which would require custom overriding if such version is uncommon in Nixpkgs. Still, such situation also occurs to large packages like TensorFlow, and small projects with few dependencies wouldn't take too long to evaluate.

@7c6f434c
Copy link
Member

Evaluation time will increase because life is hard. Basically, even though glibc does not have non-free dependencies, its evaluations «as if it is allowed to have some non-free deps» and «as if it must be transitively free» have the same result, but are not the exact same evaluation. The same for every package in the closure of the large graphical ISOs (which are also evaluated in the non-free-redistributable allowed jobset)

@Ekleog
Copy link
Member Author

Ekleog commented Dec 19, 2024

I think I already listed the two points you're mentioning? The evaluation time issue was listed in the settings from the start; and I just added the built size issue as an unresolved question, as it's currently unclear whether it's negligible or not

@Ekleog
Copy link
Member Author

Ekleog commented Dec 19, 2024

Also you seem to be hinting at a doubling of the eval time but I don't think that'd be the case. Hydra would eval a pkgset that'd consist of essentially unfreeAllowedNixpkgs // filterAttrs drvsThatNeedToBeFoss noUnfreeNixpkgs. So we'd be adding eval only by the partial evaluation time of the closure of required-foss derivations; which should be very far from the full eval time, considering relatively few derivations would require no-unfree

@vcunat
Copy link
Member

vcunat commented Dec 20, 2024

We could do the Hydra's eval simply as now but with unfree allowed – and do this check separately in a channel-blocking job (executed on a builder instead of centralized with the eval). We have similar checks already in the tarball job (pkgs/top-level/make-tarball.nix), even though we've reduced them recently.

CI might check this as well, but such regressions seem quite unlikely to me.

@7c6f434c
Copy link
Member

Oh right, evaluating all the ISOs is not negligible, but indeed can be pushed to a build

@djahandarie
Copy link

Thank you so much for working on this. Since MongoDB is the biggest offender which causes many people serious trouble day-to-day to build, perhaps we could also consider a phased rollout plan where that is the first thing to be included 😄

@7c6f434c
Copy link
Member

And MongoDB indeed has license which avoids most general concerns, in the sense that the source is available, and both arbitrary patches (as they are derivative, they are presumed same-license-as-MongoDB in Nixpkgs anyway) and running inside a network-isolated sandbox are permitted without restriction.

This is not true for all unfree-redistributable things…

@Ekleog
Copy link
Member Author

Ekleog commented Dec 29, 2024

The way I understand (and mean) the current RFC text, all currently unfree redistributable packages would stay out of hydra until marked buildableOnHydra. We could then start with just marking SSPL as buildableOnHydra, but it will be a license/package-specific discussion.

Are there any remaining concerns on the current RFC, that I could address? :)

@7c6f434c
Copy link
Member

@NixOS/infra-build just so that all of you see it…

@Mic92
Copy link
Member

Mic92 commented Dec 29, 2024

@NixOS/infra-build just so that all of you see it…

No objections to this RFC

According to [this discussion](https://github.com/NixOS/nixpkgs/issues/83433), the current statu quo dates back to the 20.03 release meeting.
More than four years have passed, and it is likely worth rekindling this discussion, especially now that we actually have a Steering Committee.

Recent exchanges have been happening in [this issue](https://github.com/NixOS/nixpkgs/issues/83884).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, we also started building all the redistribuable+unfree packages in the nix-community sister project.

See all the unfree-redis* jobsets here: https://hydra.nix-community.org/project/nixpkgs
It's only ~400 packages. The builds are available at https://nix-community.cachix.org/

The jobset is defined in nixpkgs to make upstreaming easier:
https://github.com/NixOS/nixpkgs/blob/master/pkgs/top-level/release-unfree-redistributable.nix

If this RFC passes it will be even better as users don't necessarily know about or want to trust a secondary cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to know, thank you! Though we may need to do a bit more to properly handle the "cannot be run on hydra" point that was raised above.

I can already see on the hydra link you sent that eval takes <1min, so should be a negligible addition to hydra's current eval times. Build times seem to take ~half a day. AFAIU there's a single machine running the jobs. If I read correctly, hydra currently has ~5 builders, and one trunk-combined build takes ~1 day. So it means that the build times would increase by at most ~10%, and probably less considering that there is probably duplication between what the nix-community hydra builds and what nixos' hydra is already building. I'm also not taking into account machine performance, which is probably stronger on nixos' hydra than nix-community's hydra.

I think this means eval/build times are things we can reasonably live with, and if we get any surprise we can always rollback.

There's just one thing I can't find in the links you sent to properly adjust the unresolved questions: do you know how large one build closure is on nix-community's hydra? I don't know how to get it on nixos' hydra either but it'd still help confirm there's zero risk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means eval/build times are things we can reasonably live with, and if we get any surprise we can always rollback.

Yes, especially since the way the unfree-redis jobset is put together is by evaluating and filtering trough all the nixpkgs derivations. So most likely the combined eval time is much smaller than the addition of both.

There's just one thing I can't find in the links you sent to properly adjust the unresolved questions: do you know how large one build closure is on nix-community's hydra?

The best I can think of is to build a script that takes all the successful store paths, pulls them from the cache, runs nix path-info -s on them and then sums up the value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your answer! I actually more or less found the answer from Hydra's UI. Here is my script:

curl https://hydra.nix-community.org/jobset/nixpkgs/cuda/channel/latest > hydra-jobs
cat hydra-jobs | grep '<td><a href="https://hydra.nix-community.org/build/' | cut -d '"' -f 2 > job-urls
for u in $(cat job-urls); curl "$u" 2>/dev/null | grep -A 1 'Output size' | tail -n 1 | cut -d '>' -f 2 >> job-sizes; wc -l < job-sizes | head -c -1; echo -n " / "; wc -l < job-urls; end
awk '{sum += $1} END {print sum}' job-sizes
# NVidia kernel packages take ~1.3GiB each and there are 334-164 = 170
# Total: 215G, so 45G without NVidia kernel packages

I got the following results:

  • For unfree-redist-full, a total of 215G, including 200G for NVidia kernel packages and 15G for the rest of the software
  • For cuda, a total of 482G

Unfortunately I cannot run the same test on NixOS' hydra, considering that it disabled the channels API.

I just updated the RFC with these numbers, it might make sense to not build all of cuda on hydra at first, considering the literally hundreds of duplicated above-1G derivations :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with the current Hydra workflows I'd estimate that very roughly as uploading 2 TB per month to S3. (we rebuild stuff) Except that we upload compressed NARs, so it would be less.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, that it'd be reasonable to do the following?

  1. Just push everything, and
  2. if compression is not good enough rollback CUDA & NVidia kernels; and
  3. even if we need to rollback, the added <1T would not be an issue to keep "forever"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. To me it doesn't even feel like a technical question. (3. is WIP so far, I think. There's no removal from cache.nixos.org yet.)

@numinit
Copy link

numinit commented Dec 30, 2024

Thank you for the work on this! I agree with the implementation. Redistributable in general in the license doesn't necessarily mean redistributable on Hydra, having a secondary flag that indicates whether we are comfortable doing so seems prudent.

I would second a phased rollout plan where MongoDB (and maybe CUDA) come first since they have elicited the most complaints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.