Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-store path calculation #217

Open
ghostbuster91 opened this issue Oct 21, 2023 · 9 comments
Open

nix-store path calculation #217

ghostbuster91 opened this issue Oct 21, 2023 · 9 comments
Labels

Comments

@ghostbuster91
Copy link

ghostbuster91 commented Oct 21, 2023

Hi,

First, a little bit of context. I am trying to programatically generate nix derivation using scala. It turned out that I need to calculate nix-store path in order to put the drv file into the nix-store (which is a requirement for realizing it).

Because of that I started to implement minimialistic version of hnix-store in scala, so that I can calculate the nix-store output path. I tried reading The Purely Functional Software
Deployment Model
and the code in this repository (though I don't know much haskell), however it was enough for me to get started. Then I also found https://web.archive.org/web/20221001050043/https://comono.id/posts/2020-03-20-how-nix-instantiation-works/ which was an invaluable help.

I am at the point where I can calculate nix-store path correctly for some real derivations like "/nix/store/dsn6vl7x1hbn1akgpxync19gpx2dzy8w-bootstrap-tools" or more complex /nix/store/32lr8w57frc1ij5wzc3hb9ks8vzs2ms1-libffi-3.4.4.drv.

However, for some reason I cannot calculate correctly nix-store path for "/nix/store/h8z4rypl78kwais0yim76czxjnd55dsm-python3-minimal-3.10.12"

There must be something different about this package/its inputDrvs but I fail to spot anything.

I wonder if you know any better resources about the algorithm used to calculate nix-store paths.
I will list steps that I do, in the hope that maybe someone will be able to spot a mistake:
(fixed hash derivation are left out for brevity)

  1. replace each inputDrv with its descriptor hash and sort that list lexicographical
  2. mask outputs (all outputs are set to "", environment variables that refer to outputs are also set to "")
  3. calculate hash of such modified serialized derivation
  4. concatenate the hash with metadata and hash it again: sha256("output:out:sha256:${sha256(d)}:/nix/store:${d.env("name")}")
  5. truncate to 160 bits and convert to base32 nix-variant

descriptor hashes are calculated as follows:

  1. if this is a fixed hash derivation then calculate sha256("fixed:out:${d.hashAlgo}:${d.hash}:${d.path.get}")
  2. otherwise replace each inputDrv with its descriptor hash and sort that list lexicographical
  3. calculate hash of such modified serialized derivation - sha256(derivation)

If anything I think that I might be handling multiple outputs incorrectly.
If there is a derivation A that defines several outputs:

   "outputs": {
      "dev": {
        "path": "/nix/store/8qg5ralh4c1m2pas6lbi572qykxxsxdn-libffi-3.4.4-dev"
      },
      "info": {
        "path": "/nix/store/v5j3cysbnah4m265wlm57gjmln18qq7a-libffi-3.4.4-info"
      },
      "man": {
        "path": "/nix/store/x92j3f8v85h216avky5rdi5xizx12j6h-libffi-3.4.4-man"
      },
      "out": {
        "path": "/nix/store/ksz7in14b8si5f107w3ay3ph79f67i68-libffi-3.4.4"
      }
    },

and then we depend on such derivation in B:

inputDrvs=[
    "/nix/store/32lr8w57frc1ij5wzc3hb9ks8vzs2ms1-libffi-3.4.4.drv" -> ("dev", "out"),
...
]

I don't change my logic for calculating B descriptor in terms of A and I do it the same way as if there was only a single output out both defined in B and used in A.

Thanks in advance 🙇‍♂️

@sorki
Copy link
Member

sorki commented Oct 21, 2023

I don't see anything obvious immediately but it's been a while since I read the paper. You might want to check
https://github.com/haskell-nix/hnix/blob/master/src/Nix/Effects/Derivation.hs as well

@sorki
Copy link
Member

sorki commented Oct 30, 2023

Hi!
was in a hurry, so sorry for terse response.

Have you managed to figure it out? It's pretty cool that you can use our code as a reference even as a non-Haskeller!

Btw, what's your motivation for implementing this?

@ghostbuster91
Copy link
Author

Hi, no worries :)

Have you managed to figure it out?

Unfortunately, I wasn't able to make any progress. I bet that this is quite a subtle difference which makes it even harder to spot in the code.

It's pretty cool that you can use our code as a reference even as a non-Haskeller!

well, to some degree modulo my haskell skills 😆

I plan to ran your code against the same package and check both final and intermediate results, but not knowing haskell doesn't help my motivation these days.

Btw, what's your motivation for implementing this?

So, I want to create a build tool for scala that will piggy back on nix as much as possible.
One of my requirements is to use zinc which is an incremental compiler for scala. Then, based on the user input I will create on-the-fly a nix derivation that will feed the output of inc. compiler's previous run into the current build process. So basically I will curry the zinc state across multiple build run invocations. To do this I need to be able to create nix-derivation with provided inputs as I don't won't to construct nix expressions pragmatically. Also the build configuration won't be written in nix language but in something else.

(compilerOutput, zincState1) = compileScala(sources, NoState)
(compilerOutput, zincState2) = compileScala(sources, zincState1)

I hope it makes sense 😅

@Ericson2314
Copy link
Member

CC @flokli

@Ericson2314
Copy link
Member

@ghostbuster91 Two other things:

  1. It is not unreasonable to create a mode for nix derivation add where you don't need to pre-calculate the path.
  2. If you are wiling to use content-addressing derivations (i.e. don't need Hydra) you can already skip this; those don't have precomputed output paths because the output paths are unknown until they are built.

@ghostbuster91
Copy link
Author

Re. content-addressing - yeah I heard about it and it seems that it should work. Not sure yet how much lack of hydra will be of a problem. I will need to check this.

However, since I got already quite far I wanted to finish implementing that approach. I didn't find many resources on that topic hence I figured out that I will write a blogpost documenting how this process works under the hood. So this kind of become a goal on its own :)

Re. nix derivation add - sorry I didn't get this, could you elaborate?

@Ericson2314
Copy link
Member

Ericson2314 commented Oct 30, 2023

@ghostbuster91 nix derivation add is a new command that is basically nix derivation show in reverse. It uses JSON for convenience; it should probably compute store paths too for convenience.

@flokli
Copy link

flokli commented Oct 30, 2023

As I got cc'ed - In case reading another implementation might help - during the development of Tvix we reverse-engineered the output path calculation and produced some general-purpose (rust) code in nix-compat that does the output path calculation - mostly the calculate_output_paths and derivation_or_fod_hash functions.

Consumers of this code are a bunch of testcases, as well as builtins.derivationStrict.

Maybe some of that code helps you to understand where things happen differently?

@ghostbuster91
Copy link
Author

@Ericson2314 thanks, I didn't know about this. I will check it out 👍

@flokli

In case reading another implementation might help

It definitively will. I grew up on imperative code so reading rust should be easier. Thanks for the links 🙇‍♂️

@sorki sorki added the question label Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants