-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: convert testsuite to be function based #7221
base: main
Are you sure you want to change the base?
Conversation
Refactor the code that check and acquires file lock into a routine. It is a common part running a test. Wrapping it into a routine makes it easier to reuse.
Add a version of MTestArgListCreate that parses a command line string.
Make -arg equivalent to -arg=1.
The tests with mpicolltest.h can be compiled with -DUSE_MTEST_NBC to become a non-blocking test. Compile the source multiple times with macro is inflexible to move into multi-tests framework -- run multiple tests inside a single MPI_Init/MPI_Finalize window. Convert it to use explicit option instead.
The new framework will allow running tests inside a single MPI_Init/MPI_Finalize window by making each test a uniform function interface. Each test file defines a run function that will run the test. The test file is linked with a stup util/run_mpitests.c to create individual test program that should work exactly as before. In addition, all the test files will be linked together in the binary run_mpitests, that can be used to run multiple tests within a single MPI_Init/Finalize window. All such functional tests are listed in test/mpi/maint/all_mpitests.txt. During autogen, gen_all_mpitests.py will load this file and generate all the Makefile targets. In this commit we didn't modify runtests. All tests should still work by running them individually. We'll add the ability to run multiple tests in runtests in the next commits.
Filter the test list and find all tests that can be run using run_mpitests and run them first. Tests are grouped by testlist and number of processes. The running of tests are controlled by runtests using the input/output pipes, thus we still have the granular control of individual tests. When run_mpitests abort due to e.g. segfault, restart it with next test in the testlist. Track the number of such restart and abort in case something systematic causing it to fail repeatedly, for example, error in run_mpitests itself.
Instead of `echo something > .stopfile`, make `touch .stopfile` to work as well. Refactor the code so that check stopfile once aborts all tests.
It is cleaner to split the utilities for multi-tests into its own source file. Since it will only be used in run_mpitests.c, it is simpler to just include the file as static code.
Use alarm() to enforce timeouts.
If any of the environment variables affects init, we need run that test individually so its settings does not affect the rest of the tests.
Only test in attr folder that is not converted is attrend2 since that test tests MPI_Finalize behaviors.
Convert all tests used in testlist.cvar to mpitests framework.
test:mpich/custom ✔️ Significantly accerlerate those converted collective tests.
|
test:mpich/ch3/most 2 TIMEOUT in ch4-ofi-asan. I don't think they are related to this PR, but it is good that they prove this PR works. Use
That 1 hour 36 min. to finish the collective tests. After this PR:
So 1:36 -> 0:16 |
test:mpich/whitespace |
I think my only question is how much does this differ in timing than if we configured MPICH |
Let's find out test:mpich/custom EDIT: oh, I need run this against the main branch... running in #7204 (comment) |
https://jenkins-pmrs.cels.anl.gov/job/mpich-review-custom/1182/console
So ~32min without hwloc. |
Pull Request Description
This a split/renew from #5725.
The current testsuite consists of thousands of individual mpi test programs. Running the entire testsuite involves invoking process manager to spawn mpi processes and each process goes through MPI_INIT again and again. Both the process spawning and MPI initialization are very slow compared to the tested MPI operation itself. The current testsuite runs for a couple of hours and we run hundreds of them every day.
This PR attempts to convert individual tests into functions, so multiple tests can be tested within a single MPI_Init/Finalize window. I believe this can significantly reduce the CI testing time.
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.