-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce code coverage analysis in CI #10180
Conversation
7c9837e
to
63e5b2e
Compare
37cfed9
to
f90423b
Compare
f90423b
to
f35de0e
Compare
5b919ae
to
d1ad35f
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #10180 +/- ##
===========================================
- Coverage 85.96% 72.01% -13.96%
===========================================
Files 243 706 +463
Lines 46461 141278 +94817
Branches 0 141278 +141278
===========================================
+ Hits 39942 101744 +61802
- Misses 6519 34846 +28327
- Partials 0 4688 +4688
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
09e5552
to
73c71ec
Compare
73c71ec
to
e7c20cd
Compare
e464c0f
to
17ad121
Compare
17ad121
to
ae63426
Compare
This would be great to figure out a reason for and submit an issue about in rust-lang/rust. My intuition is that the binary should be sufficient for this (as all the coverage instrumentation is part of the LLVM IR that makes it to the final binary build), but I wouldn’t be surprised if there are some files like separated debuginfo-esque files that are required for the coverage.
That's the case for now though. It is conceivable that we might want to add additional such super-integration-tests in the future in which case it again makes sense to merge those jobs together at least to some degree – and having to figure out how to merge them well is an annoying and fairly arbitrary exercise (that eventually leads to a nasty behemoth that we had with the "style checks" CI job we had in buildkite) |
This is kinda bogus as it gives a report between 8000 commits ago and now... but I guess it does work. |
I sent it as a question on taiki-e/cargo-llvm-cov#320 ; to see whether I’m missing something :) in practice, I can say it at least seems like cargo-llvm-cov is searching for object files in target/${profile} at some point while generating the report, considering the error message in this run.
Agreed; I’m hoping we can get to a resolution that’d allow to build a single neard again before this happens :) TBH I really don’t understand why the attempt without the whole target folder didn’t work, I just observed during the tests (eg. this run, and interestingly the genesis changes run did generate profraw files without issues even without the whole target folder) that Overall my hope is that taiki-e will tell me on the issue I linked above that I’m wrong and it should only need the binary; and then I’ll go back to debugging and trying to figure out why all the jobs except genesis changes failed when running with only the binary.
Right; the reason is actually that codecov was already enabled 8000 commits ago, but hasn’t received any report since then, so as soon as this lands it should start giving a report between master and now again :) |
# See https://github.com/taiki-e/cargo-llvm-cov/issues/292 | ||
- run: find target -name '*.profraw' -delete | ||
|
||
# Run integration tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmph, splitting these nextest run lines goes in opposition to #9415. If its only serving to provide some sort of categorization within the tool, I would personally much rather keep the single command option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I think #9415 should be solved by using something like just test
, not by having everything stuffed into nextest, among other reasons because if we did it we could move clippy
out and have a cache for it, not needing to rebuild it from scratch each time :)
Also, this split doesn’t change the fact that cargo nextest run
still runs all the tests and works fine. I do think the categorization within the tool is useful, especially considering yesterday’s discussion that debugging integration tests is very bad dev UX, and thus we’d mostly be interested in the code coverage for unit-tests only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, so back in buildkite days we also had "cargo nextest run", but there were like 5 different ways to run integration tests and you had to pay really close attention to which specific invocation was failing – one of them failing would usually mean the others were passing.
This was exactly what prompted me to start working on CI in the first place. My honest evaluation is that we got into that state back then precisely because it was normal and natural to "just add another invocation of cargo run/test/nextest/etc
with my snowflake combination of flags".
Having a single cargo nextest
invocation I believe does a great job pushing against such configurations (and I hope it will eventually push enough to obviate the need for nightly vs. non-nightly configurations as well.) Splitting invocations into parts goes back towards normalizing having more of the snowflake configurations.
Now, to make this actionable at all – I wonder if nextest could be made to export test coverage on a per-test basis. All of the tests are distinct binary invocations, which should allow us to produce coverage files for each individual test (and therefore could then eventually be filtered by e.g. crate, module, etc. after the fact), not just two broad categories of unit vs integration tests.
Unfortunately cargo llvm-cov nextest
appears to just run cargo nextest
after setting up the environment, but perhaps target runner scripts could be used as a mechanism to isolate and produce coverage information for each individual test (there’s even a section on debug output in the documentation?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could move clippy out and have a cache for it
I don’t see why we couldn’t have a cache right now. We just need to also cache the target/style
directory (does rust-cache not do it…?)
FWIW my belief is that the current setup with target/style
is actually better for local development purposes as developers no longer risk accidentally invalidating their cached proper build when they run cargo clippy -p foo
or something of the sort (by the virtue of not needing to run cargo clippy
manually at all.) Though now that I write this, I realized there’s a bug in how style tests set the TARGET_DIR
– it should be different for each test that builds code (in particular themis
should have its own TARGET_DIR
so it does not get blocked waiting on the target directory lock…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(does rust-cache not do it…?)
Considering the clippy test is always among the last ones to complete and is written as slow (so 120+ seconds), I’d guess it doesn’t do it.
Counter-proposal: what if we had a Justfile with:
- A rule for unit tests
- A rule for clippy that runs it with target_dir set to
target/style
- A rule for integration tests
- A
just test
rule that runs all of them
Then developers could run just test
, and CI could run the rules independently, and maybe even override the target folder depending on whether we can fix target folder caching some other way. We could also easily add the nightly configuration into the Justfile, so that there’s indeed a single command line to run. WDYT?
(I think changes to llvm-cov or cargo nextest might be a great idea as there are already many profraw files and we’d "just" need to figure out which are the right ones; but will not be reasonably doable before november 26, especially as I’m OOO the second half of next week and have a lot of other stuff to do by then)
(Also, fwiw, I personally don’t really care about having a single command line to run right now: it’s much faster for me to upload to CI and wait for the results, than to test locally, currently, but that’s part of the devenv issues we’re discussing on zulip right now, and you’re right that having a single command would in theory be good)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, relatedly, in order to be able to use the github actions for eg. lychee (that I’m introducing in #10217), we’ll need to run the tests outside of cargo nextest anyway, unless we want to check them twice in CI. In particular, we could start using the clippy action, which AFAIU has much nicer error reporting than our current "manually read the logs" system.
So I think this confirms the want for a tool like just
, so that we could have just test
run lychee
in a way similar to what is being run by CI :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so the weekend gave me ample of time to consider the last message. I agree that having nice reporting within GitHub is really nice.
But I also see a major (I think) flaw in what I think you’re proposing we should do. To double check, are you suggesting that we maintain .github/workflows/ci.yaml
and a Justfile
and that they would each implement the same checks independently? e.g. where Justfile
would run cargo clippy
, the GHA would run the clippy action? I think doing so is precisely the mistake I would like to avoid. If we’re adding a Justfile, the CI could run exactly the just
commands that ought to be run as part of a local development flow (and ideally that command is singular or is split up into sub-jobs programatically) but that does not come with the nice error reporting benefits.
But unless we eschew those benefits, I don’t see how can we ensure, especially in the long term, that the checks run via the justfile remain synchronized with those run within GHA. And that means that ultimately we would still sometimes see instances where a successful just
pass does not guarantee the CI passing (or vice versa, for that matter.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current goal is indeed to have the CI run just nextest-unit && coverage && just nextest-integration && coverage
, where just nextest
for local use would also run the two.
However, for clippy
, in order to get the nice reporting, I would indeed have a just clippy
for local development that would not be DRY with the clippy-action
for github, because by definition clippy-action
cannot know to use our Justfile
(or literally any other way to run all the tests we could have), any more than just
could automatically figure out the command clippy-action
runs.
So I’d say that by definition, I’d say that we can only have two of the following three:
- nice error reporting
- not implementing error reporting ourselves
- locally-run command is DRY with remotely-run command
I also personally think we should choose the first two, and whenever we come across an instance where just
and the CI workflow don’t give the same result we can consider that as a bug and treat it by probably just copy-pasting again from the updated upstream clippy-action
.
That being said, this PR is only about nextest
splitting, so I’d argue that the discussion about clippy lints could be postponed for the day when we actually consider using clippy-action
:) And it seems to me like you don’t disagree with the idea, if restricted to splitting the nextest step into two only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, this PR is only about nextest splitting, so I’d argue that the discussion about clippy lints could be postponed for the day when we actually consider using clippy-action :) And it seems to me like you don’t disagree with the idea, if restricted to splitting the nextest step into two only?
I mean… whatever plan we come up and start following should scale to whatever future ambitions we may have, which is why I think discussing clippy and lychee is also important… In particular one other thing I would (eventually, not necessarily in this PR) love for the setup to have is automatic determination of which just
commands to run for the CI.
This could be something similar to the wasmtime
’s determine
job which generates a job matrix for the successive jobs. In our case we’d parse something along the lines of just --summary
and generate a job matrix consisting of all the individual subcommands to run across different builders.
One thing I already like about just
setup is that it also allows for maintaining the different flag combinations for different OSes (so we can move --ignore integration-tests
on macos down into the justfile.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular one other thing I would (eventually, not necessarily in this PR) love for the setup to have is automatic determination of which just commands to run for the CI.
While I’m philosophically of the same opinion, at the same time I’m wondering whether the time to implement and the effort to maintain such a determination are financially worth the savings in compute time 😅 but it’d be really nice to have that!
21c0733
to
3908c6f
Compare
3908c6f
to
0bf0eac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok lets land this for now, I think Justfile is a reasonable future solution to the downgrades in developer experience this PR might temporarily introduce if left unattended…
Awesome, thank you! I’ll do my best to submit a PR with the |
This reverts commit 956add9.
Note: I’m removing the preparatory "Build neard" job here. The reason is twofold: