Use the opam library (for read_only operations) #75

pitag-ha · 2022-06-27T11:58:27Z

Using the library instead of the binary has four advantages: it avoids parsing the printed output of the binary, it yields a cleaner code, it should speed up the time needed for the queries, and it makes us depend less on a binary whose future is uncertain. For the pen-ultimate statement about the speed-up, let's have a look at the CI and how long it takes for each step compared to before to see if that's actually true.

I'm already opening the PR to discuss if this is something we want to go forward with before doing the last things I'd do:

have a look at what exactly env does when querying dependency state and possibly adapt it to what we need
find out if the information stored in Opam.GlobalObts should still be used, also when making the queries via the library (it was used before and currently I'm not using it anymore)
have a look if what I've done so far can be sped-up a bit more. I think the two places taking up time at the moment is loading the sandbox switch state to find the installed files by a tool and loading a virtual switch state to load a map with all opam packages to find out the version of the tools we want to install.
take advantage of that now we're using the library to avoid passing around strings all the time and pass around more specific types instead (e.g. OpamPackage.Name.t).
generally, clean up a bit

If I remember correctly, the last time you tried using the library, you discarded doing that for the following reasons:

supposedly it's more complicated. I don't think I agree with that. I think the code stays similarly complex as it was before but becomes far cleaner now: see comments from before about not parsing the binary output anymore (which could be re-formatted at any time) and about using more specific types. Let me know if you still disagree though, please!
different versions of the opam library in the binary installed by the user and the one we're using can lead to opam failures. To avoid that happening, I only load the switch state with read_only lock. For installing, we're still using the binary...
it doesn't even speed-up things. I would be a bit surprised if that was true to be honest, but it's possible. Let's see what the time of the CI says.

Let me know what you think about this PR in general please to see if it's worth spending the time to improve the things mentioned above!

Using the library instead of the binary avoids parsing the output of the binary, yields a cleaner code and should spead up the time needed for the queries.

pitag-ha · 2022-06-27T12:12:48Z

About my sentence

For the pen-ultimate statement about the speed-up, let's have a look at the CI and how long it takes for each step compared to before to see if that's actually true.

Actually, I guess it's better to try that locally instead of relying on the CI server... If this point is relevant for you to decide, let me know and I try locally.

panglesd

Here is a first set of comments. In addition to the comment in the code:

I do not agree with changing all the Ocaml_version.t to OpamPackage.t, the former is what we need, and it avoids having to call for the version of a package several times...
Why not having the with_* functions wrapped into more standalone functions? It seems that the switch_state is never used more than once, so that won't really make it longer... (for now, so maybe we should keep it as it is) (EDIT: I do not think we should modify this anymore)

panglesd · 2022-06-27T18:10:21Z

src/lib/binary_package.ml

+  match Fpath.of_string path with
+  | Error (`Msg s) -> Error (`Msg s)


Why changing the type of path from Fpath.t to string And if you do so, why not using the let* syntax?

panglesd · 2022-06-27T18:16:56Z

src/lib/opam.ml

+          |> List.fold_left
+               (fun acc file ->
+                 match file with
+                 | filename, _ -> OpamFilename.to_string filename :: acc)


Let's Fpath.parse it!

Suggested change

| filename, _ -> OpamFilename.to_string filename :: acc)

| filename, _ -> filename |> OpamFilename.to_string |> Fpath.parse :: acc)

I think you meant Fpath.of_string, which does error checking. But I think we can get away with Fpath.v and expect OpamFilename to give us valid paths.

Suggested change

| filename, _ -> OpamFilename.to_string filename :: acc)

| filename, _ -> OpamFilename.to_string filename |> Fpath.v :: acc)

panglesd · 2022-06-28T07:50:01Z

src/lib/binary_package.ml

+  Opam.Queries.(
+    with_switch_state
+      ~dir_name:(Sandbox_switch.get_sandbox_root sandbox)
+      (files_installed_by_pkg query_name))
+  >>= fun paths ->


Why not wrap all of this into a list_files function again?

Yes! I'd prefer to keep the Opam module API close to what it was (of course not exactly the same).
This was useful for maintenance as it avoids inconsistent uses of the same API in different places and adds types, for example, it certainly helped to write this patch.

panglesd · 2022-06-28T08:12:55Z

src/lib/opam.ml

+  let latest_version ~metadata_universe ~pkg_universe ~ocaml package =
+    let package = OpamPackage.Name.of_string package in


If we want to stop carrying strings around, package should be of type OpamPackage.Name.t

Actually that's one of your points!

panglesd · 2022-06-28T08:20:02Z

src/lib/opam.ml

+    let compatible_ones =
+      OpamPackage.Set.filter
+        (fun pkg ->
+          OpamPackage.Name.equal (OpamPackage.name pkg) package
+          &&
+          let pkg_opam_file = OpamPackage.Map.find pkg metadata_universe in
+          let dependencies = OpamFile.OPAM.depends pkg_opam_file in
+          let env _ = None in
+          OpamFormula.verifies
+            (OpamPackageVar.filter_depends_formula ~env dependencies)
+            ocaml)
+        pkg_universe
    in


This seems not very efficient to me: it will have to see all package and check if the name is the right one.
Would there be a way to get all versions of package, and replace pkg_universe with this list, and then filter with the formula?

(but maybe it is not long, I should try it out, there are not so many opam package for a computer!)

panglesd · 2022-06-28T08:21:06Z

src/lib/opam.ml

+    try Some (OpamPackage.max_version compatible_ones package)
+    with Not_found -> None
+
+  let installed_versions pkg_names


pkg_names should be of type OpamPackage.Name.t list here

panglesd · 2022-06-28T08:21:37Z

src/lib/opam.ml

+        (name, version))
+      pkg_names
+
+  let files_installed_by_pkg pkg_name


pkg_name should be of type OpamPackage.Name.t here

panglesd · 2022-06-28T08:23:15Z

src/lib/opam.ml

+  let get_metadata_universe (switch_state : 'lock OpamStateTypes.switch_state) =
+    switch_state.opams
+
+let get_switch = function


format error ? :)

panglesd · 2022-06-28T13:30:37Z

From a few experiments, in terms of "inferring tools version", it speeds things up a lot when dealing with old versions of the compiler (which is not uncommon at all, as opam by default inits a system switch which tend to be old), but it also slows things down on a recent switch, with some tools already installed...
So it is a great improvement, but there must be a way to make it faster, if the binary can make it faster!?

panglesd · 2022-06-28T13:33:48Z

About my last review:

I do not agree with changing all the Ocaml_version.t to OpamPackage.t, the former is what we need, and it avoids having to call for the version of a package several times...

I don't have a strong opinion on that, the "I do not agree" does not reflect that...

Why not having the with_* functions wrapped into more standalone functions? It seems that the switch_state is never used more than once, so that won't really make it longer... (for now, so maybe we should keep it as it is)

Now I think that it is good as it is, it allows us to have more control on how long we want to keep the states loaded. So ignore this comment!

Julow · 2022-07-01T09:32:51Z

it should speed up the time needed for the queries, and it makes us depend less on a binary whose future is uncertain

I disagree with these two points:

It's possible to express most of the queries we need as a single fast query (citation needed :p). The current code don't make sense and shouldn't be a baseline for optimisations: running multiple queries is what we shouldn't do.
I don't believe the librarie would bring any speedup at all if we do less queries. Also, I believe the shareable work (loading the read-only states) is fast.
The binary is the intended API for Opam. opam-state isn't the most unstable library but I don't see a reason that it would be more stable than the CLI. If Opam ever change drastically, opam-state is the first thing to go: the state is the problematic part.
Depending on the CLI is better in general: more defined API, backward-compatibility possible (and taken into account in the case of Opam), usually stable, no compatibility problems with the system's version of Opam.

If the magical single fast query I was talking about exists, I would prefer to stick with the CLI, which is also simpler to call (except the part where we need an angstrom parser).

Otherwise, I agree that the code is better with more types and less command outputs to parse.

pitag-ha · 2022-07-01T17:40:44Z

Hey @Julow , after @panglesd's comments I've already started (and pretty much finished) the second part of this PR that I mentioned of making our code more type safe by making use of the opam library types. But now, after reading your comments, I'm doubting if I should finish that.

About your comments about performance: tbh, I'm still not sure if library or binary would be better in terms of performance. In fact, it's possible that it will be similar in the end.

If it's similar, I think the library would be clearly the better choice, simply because I prefer compile time errors over runtime errors. Concretely, with the library we can make the code far more type save (instead of having strings everywhere, we use concrete types) and because we don't need to parse the output of the binary anymore. Type safeness is one of the many cool features OCaml can leverage when we want it to!

And similarly about your second comment:

The binary is the intended API for Opam. opam-state isn't the most unstable library but I don't see a reason that it would be more stable than the CLI

The problem about the binary output possibly being unstable is that we'll most likely notice a change there at runtime. I mean, we make some opam binary query, parse its output into a string, pass that string around and at the end make another call to the opam binary with that string, hoping that that string represents a tool name.

However, if we use the library and there's a change in the library, we'll notice at compile time, adapt the code to the new version (like always when a dependency of ours cuts a breaking release) and that's it.

So, I'm pretty much saying the opposite of your statement:

Depending on the CLI is better in general

The reasons you give for that statement are:

more defined API, backward-compatibility possible (and taken into account in the case of Opam), usually stable, no compatibility problems with the system's version of Opam.

I'll need to understand what you mean with those points to answer to them.

about "more defined API": In which way is the binary API more defined than the library API?
about "backward-compatibility possible": well, the good thing about the library is that we don't even need to care that much about backwards compatibility, since we can just use the version of the library we want instead of relying on which binary version the user has installed. Or am I misunderstanding your point?
about "usually stable": I would say that depends on which part of the CLI we're talking about. I agree that the commands of the CLI tend to be stable, but I'm not sure if there's a guarantee that the output formatting will be stable. In fact, just changing some registered printer somewhere, can already affect the output formatting, which we rely on when parsing it. Which part are you talking about? The commands or the output formatting?
about "no compatibility problems with the system's version of Opam": we don't have them with the library either as long as we lock the state only for reading.

Do you maybe already want me to push the changes I've made to improve type safeness so that you can have a look at what that would look like (even though it isn't perfect yet)?

Julow · 2022-07-04T10:22:43Z

About performances, I think the "one magical query" might run in ~1s (the time to start an opam show query but 5 times faster than loading the entire solver state). This might be wrong, in which case we might have more opportunity to optimize using the library.

about "more defined API": In which way is the binary API more defined than the library API?

The binary API is defined and a lot of work in put into Opam for not breaking the API and versionning. The library don't have this concern but as you say, it doesn't really matter because breakings are detected at compile time.
On the other hand, the library put a lot of effort in being compatible with different versions of the state, when in read-only mode.
We have a draw on this argument: Both are good at compatibility.

Do you maybe already want me to push the changes I've made to improve type safeness so that you can have a look at what that would look like (even though it isn't perfect yet)?

Yes please!

Use the opam library for read_only operations

c167691

Using the library instead of the binary avoids parsing the output of the binary, yields a cleaner code and should spead up the time needed for the queries.

pitag-ha added the no changelog label Jun 27, 2022

panglesd reviewed Jun 28, 2022

View reviewed changes

panglesd mentioned this pull request Jun 29, 2022

Use a .install file when possible #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the opam library (for read_only operations) #75

Use the opam library (for read_only operations) #75

pitag-ha commented Jun 27, 2022

pitag-ha commented Jun 27, 2022

panglesd left a comment •

edited

Loading

panglesd Jun 27, 2022

panglesd Jun 27, 2022

Julow Jul 1, 2022

panglesd Jun 28, 2022

Julow Jul 1, 2022

panglesd Jun 28, 2022

panglesd Jun 28, 2022

panglesd Jun 28, 2022

panglesd Jun 28, 2022

panglesd Jun 28, 2022

panglesd Jun 28, 2022

panglesd Jun 28, 2022

panglesd commented Jun 28, 2022

panglesd commented Jun 28, 2022 •

edited

Loading

Julow commented Jul 1, 2022

pitag-ha commented Jul 1, 2022

Julow commented Jul 4, 2022

		match Fpath.of_string path with
		\| Error (`Msg s) -> Error (`Msg s)

	\| filename, _ -> OpamFilename.to_string filename :: acc)
	\| filename, _ -> filename \|> OpamFilename.to_string \|> Fpath.parse :: acc)

		let latest_version ~metadata_universe ~pkg_universe ~ocaml package =
		let package = OpamPackage.Name.of_string package in

Use the opam library (for read_only operations) #75

Are you sure you want to change the base?

Use the opam library (for read_only operations) #75

Conversation

pitag-ha commented Jun 27, 2022

pitag-ha commented Jun 27, 2022

panglesd left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panglesd commented Jun 28, 2022

panglesd commented Jun 28, 2022 • edited Loading

Julow commented Jul 1, 2022

pitag-ha commented Jul 1, 2022

Julow commented Jul 4, 2022

panglesd left a comment •

edited

Loading

panglesd commented Jun 28, 2022 •

edited

Loading