-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
driver: Record execution time of every commands #978
Conversation
The original reason for adding in the extra libraries for the benchmark was that core/core_kernel is considerably more complex than base, and therefore provided some more coverage of odoc's code. I'm a bit reluctant to remove it without a solid reason for doing so. |
There are a few other annoyances associated with the driver as-is, not things you're introducing in this commit, obviously:
|
I don't believe that the quality would increase if it runs for longer. It would have a huge input and output, I'm worried that we are only measuring the file system caches. To increase the quality of the result, we could perhaps do a warmup run ? This is not reasonable with a very long benchmark. I don't think coverage is important when everything is averaged, especially if the atypical code is so much bigger. Perhaps we would make a different benchmark for Core alone ? So that we measure only the atypical code. |
It's not to do with having longer runs, it's to do with the complexity of core vs base. The point of the benchmark is precisely to measure the atypical code, since it's pretty easy for regressions to creep in there unnoticed. Some early versions of odoc ran quite quickly on base and took many minutes to run on core_kernel - and I do actually run this benchmark test whenever I'm working on the performance sensitive bits odoc. |
I did not realize it was used, I don't want to remove it then. Reverted. |
To avoid the maintenance cost, I suggest to remove the logs of commands. |
We're using this driver for a few different use cases now - benchmarking / testing / creating the odoc website. I quite like having the concrete commands recorded for the purposes of the website, but it definitely gets in the way of day-to-day testing. We've got the env var turning on/off the extra dependencies - we could perhaps use the same mechanism to turn the logging of the commands on and off. E.g. |
I'm not a fan of making the output even more non deterministic. The list of commands is huge. Do you expect anyone to use it ? Ideally, we'd have a different instance for different purposes. One for current-bench, one for building core and one for the doc. |
Well, yes. I used it myself it a few weeks back when doing the voodoo prototype for source rendering. It's way easier to read the actual shell invocations of odoc than to simply read the function that assembles the command line.
The thing that makes this a struggle is that the main use of this code is to be very clear about how to use odoc, so I'm not keen on splitting it up into multiple files - it makes for more tricky reading. Adding some straightforward logic to log or not won't make the code any harder to read.
Surely this makes is more deterministic!? |
Mdx have a way to include code from different parts of one .ml file (with delimiting comments). The documentation would stay the same. Would you mind if I try ?
I slightly disagree. Adding any amount of code makes it harder to read.
Sure but that doesn't solve any problem. There will always be a huge diff, the exit status will always be an error. |
Ah, what I meant was that we would turn everything off for the common case of testing, and only turn the logging on for when we're building in order to publish the docs. That way there would be no logs in the driver in the source tree, and no changes to promote when you build it locally. |
I don't like that the publish case is the one with an error status but this sounds like a good compromise. |
Would it be possible (and not too complicated) to extend the current PR to also record the size of the produced files? I think it would be an interesting metric! |
I removed the list of executed command but kept the code for printing it. It can be printed back by un-commenting a line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that current-bench
requires a Makefile
as it runs make bench
to run the benchmark.
dune
Outdated
(rule | ||
(alias bench) | ||
(action | ||
(diff driver-benchmarks.json doc/driver-benchmarks.json))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates an error when driver-benchmarks.json
is not already present at the root of the project (which is the case in the repo)... maybe generate an empty one if there is none?
When running
If I cd into Any idea where this could come from? I would say some dune rules, where something gets executed asynchronously due to wrongly computed dependency, but I cannot find such problem in your rules... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really cool! Can't wait to see the graphs ^^
Almost ready to merge, remains to:
- add the dependency in the dune file for the bench rule
- add
make bench
, and make it output the json on standard output! - Use
Unix.gettimeofday
or at least understand why there is such a difference...
What would you think about plotting the compile and link values in the same graph? That is, one graph containing both compile and link values, per {size; time; number of files}
. It reduces the number of graphs, and lets you compare compile and link more easily.
To plot multiple values in the same graph, you can use common prefix with a slash. For instance, in the example linked above the time taken for typing, parsing, etc., are shown in the same graph because of the ocaml/ prefix.
Also, what about plotting the total space/time used by compile and by link?
About the removal of the list of executed commands, I don't have a strong opinion. I find them sometimes useful when debugging, but it's very annoying that their output depends on the machine. It makes the file much bigger, and generate huge diffs.
I'm sure there is a better solution than a commenting/uncommenting a line (maybe output them in another file that is ignored by git?).
However, I think it is already an improvement, and I want the rest of the PR merged, so I'm fine with it!
$ '../src/odoc/bin/main.exe' 'html-generate' 'int_replace_polymorphic_compare.odocl' '-o' 'html' '--theme-uri' 'odoc' '--support-uri' 'odoc' | ||
$ '../src/odoc/bin/main.exe' 'html-generate' 'inlining_transforms.odocl' '-o' 'html' '--theme-uri' 'odoc' '--support-uri' 'odoc' | ||
$ '../src/odoc/bin/main.exe' 'html-generate' 'inlining_decision_intf.odocl' '-o' 'html' '--theme-uri' 'odoc' '--support-uri' 'odoc' | ||
$ '../src/odoc/bin/main.exe' 'html-g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors in the computation of the size are silently ignored. Maybe output them on the standard output, so that the user knows something has happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm something weird happened with the locations of the comments. They seem good in the "files changed" panel but not the "discussion" one. This one is about line 791 of the new file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I've added an extra metric for the number of file measured, that a human can use to asses that changes in the size metrics are meaningful.
I'm not sure we need to do much more here as this error is totally unexpected. This filtering code is mostly intended to filter-out commands that don't have a recorded output-file.
I think this is ready for an other review. I've implemented |
It seems that |
EDIT: This has to do with my switch being 4.04 (I was testing compat code). I don't know how we could improve the situation for old switches running Old OCaml issue: I still have some issues:
Edit: trying some blibli:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I still have the missing "driver-benchmarks.json" error. I think we should use
cat doc/driver-benchmark.json
instead of diffing a file with a non-existent file - Running
make bench
twice without modifications use the cache and so produces the same result. Can be a problem if one wants to make some mean, or rerun the benchmark with less program running in his machine, ... dune build @docgen
is not deterministic anymore, making an infinite loop ofdune build @docgen
,dune promote
. This diff will be noisy when working on the driver.
I opened Julow#3 to fix the comments above. |
Thanks for the help! The rule is much better like that. |
The list is not deterministic. It can easily be shown again if needed by un-commenting a line.
This reflects the dependency change but also changes in the updated dependencies. Apparently, the `sort` tool doesn't sort the same way on different platforms.
The result is written into a current-bench's JSON format and promoted by dune build @bench Co-authored-by: panglesd <[email protected]>
The output files of compile and link commands are recorded and their size is measured.
Which gives more reliable results. Co-authored-by: panglesd <[email protected]>
These commands can be reproduced with instrumentation enabled to debug performance problems. Co-authored-by: Paul-Elliot <[email protected]>
The results are stored in `_build/default/doc/landmarks` with unique names. Co-authored-by: Paul-Elliot <[email protected]>
This aborts some obviously broken metric collections. This situation happens when a dependency is missing or if only some blocks have failed.
Co-authored-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
This file has special meaning for current-bench. It does not define a package that can be installed or released.
I'm satisfied by the current state of this PR and I would merge if someone else agrees. |
I'm happy with the current state as well 🙂 @jonludlam do you want to have a look?
EDIT: it now works! |
Co-authored-by: panglesd <[email protected]>
Record the time taken by every commands and save the result in current-bench's JSON format.
The running time of every sub-commands are recorded (min, max, avg) and the total number of times they are called.
Most of the diffs are due to more recent dependencies, the code computing the metrics is at the bottom.
The benchmark results look like this: (but all on one line)
Problems:
Are the times correct ? Summing up all the numbers give a total running time of 0.5s while it takes 26s to run
@docgen
on my machine.I know Mdx is slow to run bytecode and process the output of top-level blocks, but that's extreme.
dune build @bench
fails on Dune 3.7.1 with this message, even though the documentation says that<file1>
doesn't have to exist: