-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce code coverage analysis in CI #10180
Merged
Merged
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
ee7b9e5
introduce code coverage analysis in CI
Ekleog-NEAR f35de0e
try workspace cleaning in case it’s the thing breaking it all
Ekleog-NEAR 82ffa9f
try removing llvm_profile_file from nextest job
Ekleog-NEAR 5c21b41
remove export-prefix
Ekleog-NEAR 4653fd5
try with proper github_env syntax
Ekleog-NEAR 2a739fd
use simpler way of setting the env
Ekleog-NEAR bd4812a
add debug target listing
Ekleog-NEAR d1ad35f
try with llvm profile file override again
Ekleog-NEAR 3ff8d46
remove llvm_profile_file trick
Ekleog-NEAR b089cdf
let cargo llvm-cov find the right cargo profile
Ekleog-NEAR b7571f9
try using what seems to work everywhere
Ekleog-NEAR 41d72b8
debug what’s going wrong with pytests
Ekleog-NEAR dc548a1
actually delete all profraw files
Ekleog-NEAR e7c20cd
disable integration-tests for macos at the right place
Ekleog-NEAR 557baed
try creating the target folder early?
Ekleog-NEAR d4c5a79
try not using pytest as workdir
Ekleog-NEAR 41d555c
and with creating target folder?
Ekleog-NEAR 4ad3102
try uploading the whole target folder
Ekleog-NEAR a974887
compress target before uploading
Ekleog-NEAR 6705812
do not upload whole target folder
Ekleog-NEAR ae63426
discard preparatory build neard job
Ekleog-NEAR edccf7c
cleanup test lines
Ekleog-NEAR f1fcc7c
review comments
Ekleog-NEAR a1c1976
upload with codecov token to avoid intermittent failure to autodetect…
Ekleog-NEAR 0bf0eac
Merge remote-tracking branch 'upstream/master' into coverage
Ekleog-NEAR c3d6ab5
Merge remote-tracking branch 'upstream/master' into coverage
Ekleog-NEAR File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmph, splitting these nextest run lines goes in opposition to #9415. If its only serving to provide some sort of categorization within the tool, I would personally much rather keep the single command option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I think #9415 should be solved by using something like
just test
, not by having everything stuffed into nextest, among other reasons because if we did it we could moveclippy
out and have a cache for it, not needing to rebuild it from scratch each time :)Also, this split doesn’t change the fact that
cargo nextest run
still runs all the tests and works fine. I do think the categorization within the tool is useful, especially considering yesterday’s discussion that debugging integration tests is very bad dev UX, and thus we’d mostly be interested in the code coverage for unit-tests only.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, so back in buildkite days we also had "cargo nextest run", but there were like 5 different ways to run integration tests and you had to pay really close attention to which specific invocation was failing – one of them failing would usually mean the others were passing.
This was exactly what prompted me to start working on CI in the first place. My honest evaluation is that we got into that state back then precisely because it was normal and natural to "just add another invocation of
cargo run/test/nextest/etc
with my snowflake combination of flags".Having a single
cargo nextest
invocation I believe does a great job pushing against such configurations (and I hope it will eventually push enough to obviate the need for nightly vs. non-nightly configurations as well.) Splitting invocations into parts goes back towards normalizing having more of the snowflake configurations.Now, to make this actionable at all – I wonder if nextest could be made to export test coverage on a per-test basis. All of the tests are distinct binary invocations, which should allow us to produce coverage files for each individual test (and therefore could then eventually be filtered by e.g. crate, module, etc. after the fact), not just two broad categories of unit vs integration tests.
Unfortunately
cargo llvm-cov nextest
appears to just runcargo nextest
after setting up the environment, but perhaps target runner scripts could be used as a mechanism to isolate and produce coverage information for each individual test (there’s even a section on debug output in the documentation?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t see why we couldn’t have a cache right now. We just need to also cache the
target/style
directory (does rust-cache not do it…?)FWIW my belief is that the current setup with
target/style
is actually better for local development purposes as developers no longer risk accidentally invalidating their cached proper build when they runcargo clippy -p foo
or something of the sort (by the virtue of not needing to runcargo clippy
manually at all.) Though now that I write this, I realized there’s a bug in how style tests set theTARGET_DIR
– it should be different for each test that builds code (in particularthemis
should have its ownTARGET_DIR
so it does not get blocked waiting on the target directory lock…)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the clippy test is always among the last ones to complete and is written as slow (so 120+ seconds), I’d guess it doesn’t do it.
Counter-proposal: what if we had a Justfile with:
target/style
just test
rule that runs all of themThen developers could run
just test
, and CI could run the rules independently, and maybe even override the target folder depending on whether we can fix target folder caching some other way. We could also easily add the nightly configuration into the Justfile, so that there’s indeed a single command line to run. WDYT?(I think changes to llvm-cov or cargo nextest might be a great idea as there are already many profraw files and we’d "just" need to figure out which are the right ones; but will not be reasonably doable before november 26, especially as I’m OOO the second half of next week and have a lot of other stuff to do by then)
(Also, fwiw, I personally don’t really care about having a single command line to run right now: it’s much faster for me to upload to CI and wait for the results, than to test locally, currently, but that’s part of the devenv issues we’re discussing on zulip right now, and you’re right that having a single command would in theory be good)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, relatedly, in order to be able to use the github actions for eg. lychee (that I’m introducing in #10217), we’ll need to run the tests outside of cargo nextest anyway, unless we want to check them twice in CI. In particular, we could start using the clippy action, which AFAIU has much nicer error reporting than our current "manually read the logs" system.
So I think this confirms the want for a tool like
just
, so that we could havejust test
runlychee
in a way similar to what is being run by CI :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so the weekend gave me ample of time to consider the last message. I agree that having nice reporting within GitHub is really nice.
But I also see a major (I think) flaw in what I think you’re proposing we should do. To double check, are you suggesting that we maintain
.github/workflows/ci.yaml
and aJustfile
and that they would each implement the same checks independently? e.g. whereJustfile
would runcargo clippy
, the GHA would run the clippy action? I think doing so is precisely the mistake I would like to avoid. If we’re adding a Justfile, the CI could run exactly thejust
commands that ought to be run as part of a local development flow (and ideally that command is singular or is split up into sub-jobs programatically) but that does not come with the nice error reporting benefits.But unless we eschew those benefits, I don’t see how can we ensure, especially in the long term, that the checks run via the justfile remain synchronized with those run within GHA. And that means that ultimately we would still sometimes see instances where a successful
just
pass does not guarantee the CI passing (or vice versa, for that matter.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current goal is indeed to have the CI run
just nextest-unit && coverage && just nextest-integration && coverage
, wherejust nextest
for local use would also run the two.However, for
clippy
, in order to get the nice reporting, I would indeed have ajust clippy
for local development that would not be DRY with theclippy-action
for github, because by definitionclippy-action
cannot know to use ourJustfile
(or literally any other way to run all the tests we could have), any more thanjust
could automatically figure out the commandclippy-action
runs.So I’d say that by definition, I’d say that we can only have two of the following three:
I also personally think we should choose the first two, and whenever we come across an instance where
just
and the CI workflow don’t give the same result we can consider that as a bug and treat it by probably just copy-pasting again from the updated upstreamclippy-action
.That being said, this PR is only about
nextest
splitting, so I’d argue that the discussion about clippy lints could be postponed for the day when we actually consider usingclippy-action
:) And it seems to me like you don’t disagree with the idea, if restricted to splitting the nextest step into two only?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean… whatever plan we come up and start following should scale to whatever future ambitions we may have, which is why I think discussing clippy and lychee is also important… In particular one other thing I would (eventually, not necessarily in this PR) love for the setup to have is automatic determination of which
just
commands to run for the CI.This could be something similar to the
wasmtime
’sdetermine
job which generates a job matrix for the successive jobs. In our case we’d parse something along the lines ofjust --summary
and generate a job matrix consisting of all the individual subcommands to run across different builders.One thing I already like about
just
setup is that it also allows for maintaining the different flag combinations for different OSes (so we can move--ignore integration-tests
on macos down into the justfile.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I’m philosophically of the same opinion, at the same time I’m wondering whether the time to implement and the effort to maintain such a determination are financially worth the savings in compute time 😅 but it’d be really nice to have that!