-
Notifications
You must be signed in to change notification settings - Fork 5
Benchmarking framework: Updated #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@egparedes any feedback on these changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general cleanup of the test logic, but there are a couple of issues to fix.
): | ||
if orchestration and not helpers.is_dace(backend): | ||
pytest.skip("Orchestration test requires a dace backend.") | ||
|
||
if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will skip the validation test for the regional experiment if the benchmark is enabled, which I don't think is the right behavior. See next comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is issue hasn't been addressed yet. One solution could be to make the benchmark argument optional in the run_verify_and_benchmark
helper and then do something like:
if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT: | |
if experiment == dt_utils.REGIONAL_EXPERIMENT: | |
# Skip benchmarks for this experiment | |
benchmark = None |
In this case the helper would always verify the function but would only run the benchmark if the benchmark fixture is passed and enabled.
model/atmosphere/diffusion/tests/diffusion_tests/test_diffusion.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the current approach and have a couple of suggestions to improve it further.
model/atmosphere/diffusion/tests/diffusion_tests/test_diffusion.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style-related comments. Additionally, it would be good to add some tests for the most critical/complex functions here.
input_data: dict, | ||
reference_outputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing and too generic type hints.
connectivities_as_numpy: dict, | ||
input_data: dict, | ||
benchmark, # benchmark fixture | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing and too generic type hints.
FYI, @egparedes Once you are happy with the verification/benchmarking infrastructure, I will add more granule tests (I just wanted to stabilize it first and then simply apply it to other tests -easy part ;) -) |
I'm sorry if my comment about adding tests was misleading, but I meant to add unit tests for the benchmarking infrastructure enhanced in this PR. |
How about now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments.
benchmark, # benchmark fixture | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing type hints
): | ||
if orchestration and not helpers.is_dace(backend): | ||
pytest.skip("Orchestration test requires a dace backend.") | ||
|
||
if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is issue hasn't been addressed yet. One solution could be to make the benchmark argument optional in the run_verify_and_benchmark
helper and then do something like:
if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT: | |
if experiment == dt_utils.REGIONAL_EXPERIMENT: | |
# Skip benchmarks for this experiment | |
benchmark = None |
In this case the helper would always verify the function but would only run the benchmark if the benchmark fixture is passed and enabled.
all_correct = np.all(field == (base_value + increment)) | ||
|
||
assert type(all_correct) is np.bool_ | ||
assert all_correct, f"Field verification failed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_correct = np.all(field == (base_value + increment)) | |
assert type(all_correct) is np.bool_ | |
assert all_correct, f"Field verification failed" | |
np.testing.assert_all_close(field == (base_value + increment)) |
|
||
def test_verification_benchmarking_infrastructure(benchmark): | ||
base_value = 1 | ||
field = base_value*np.ones((1000, 1000), dtype=base_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use an array, a simple 0d field should be ok for this function
field = base_value*np.ones((1000, 1000), dtype=base_type) | |
field = base_value*np.ones((), dtype=base_type) |
incr_func(field, increment) | ||
verify_field(field, increment, current_base_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure that python passes correctly through the run_verify_and_benchmark
function and increments the field. Therefore, the second time that we call these functions, the base value is already altered.
benchmark, | ||
) | ||
|
||
current_base_value = np.random.choice(field.flat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 0d fields:
current_base_value = np.random.choice(field.flat) | |
current_base_value = field[()] |
|
||
from icon4py.model.testing import helpers | ||
|
||
base_type = np.int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a constant, use the proper naming conventions (BASE_TYPE
), but I don't see the need for it in such a small example. Additionally, in numpy world, this is almost always called dtype
not type
helpers.run_verify_and_benchmark( | ||
functools.partial(incr_func, field=field, increment=increment), | ||
functools.partial(verify_field, field=field, increment=increment, base_value=base_value), | ||
benchmark, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only testing the happy path where everything works, but it's not testing that the verify fails appropriately when needed. Also, we shouldn't request and use the actual benchmark
fixture here to avoid messing up with the results of the benchmarks. Just use a mock, setting the enabled
attribute to both True
and False
to check that the dispatching logic is ok and only calls the benchmarking function when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @egparedes
I now fixed the testing of the non-happy path as well.
Let me know if that's ok.
@halungge do you think that we can add more granule tests for benchmarking, other than the ones that I have already included? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only small things from previous rounds which have not been yet addressed.
np.testing.assert_allclose(field, base_value + increment) | ||
|
||
|
||
def test_verification_benchmarking_infrastructure(benchmark: Optional[pytest.FixtureRequest]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark
fixture doesn't need to be used here, so it shouldn't be requested. Please remove it.
helpers.run_verify_and_benchmark( | ||
functools.partial(incr_func, field=field, increment=increment), | ||
functools.partial(verify_field, field=field, increment=increment, base_value=base_value), | ||
benchmark_fixture=None, # no need to benchmark this test | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said in a previous review, instead of passing the real benchmark
fixture you should pass a mock (please read the documentation at https://devdocs.io/python~3.13/library/unittest.mock#unittest.mock.Mock) here with the enabled
attribute set to True/False and then assert if it has been called or not. Something like:
helpers.run_verify_and_benchmark( | |
functools.partial(incr_func, field=field, increment=increment), | |
functools.partial(verify_field, field=field, increment=increment, base_value=base_value), | |
benchmark_fixture=None, # no need to benchmark this test | |
) | |
benchmark = mock.Mock(enabled=False) | |
helpers.run_verify_and_benchmark( | |
functools.partial(incr_func, field=field, increment=increment), | |
functools.partial(verify_field, field=field, increment=increment, base_value=base_value), | |
benchmark_fixture=benchmark, | |
) | |
assert not benchmark.called | |
benchmark.enabled = True | |
helpers.run_verify_and_benchmark( | |
functools.partial(incr_func, field=field, increment=increment), | |
functools.partial(verify_field, field=field, increment=increment, base_value=base_value), | |
benchmark_fixture=benchmark, | |
) | |
assert benchmark.called |
This is only an example to show how to use it, don't take it as a real suggestion. Try to fit something like that in the test in a way that makes sense and tests all relevant code paths.
connectivities_as_numpy: dict[str, np.ndarray], | ||
input_data: dict[str, gtx.Field], | ||
benchmark: pytest.FixtureRequest, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return type hint
@@ -199,8 +211,7 @@ def __init_subclass__(cls, **kwargs): | |||
# reflect the name of the test we do this dynamically here instead of using regular | |||
# inheritance. | |||
super().__init_subclass__(**kwargs) | |||
setattr(cls, f"test_{cls.__name__}", _test_validation) | |||
setattr(cls, f"test_{cls.__name__}_benchmark", _test_execution_benchmark) | |||
setattr(cls, f"test_{cls.__name__}", _test_and_benchmark) | |||
|
|||
|
|||
def reshape(arr: np.ndarray, shape: tuple[int, ...]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your PR, but since you're touching already this file, maybe you could the proper return type hint here.
@@ -103,6 +106,11 @@ def test_advection_run_single_step( | |||
pytest.xfail( | |||
"This test is skipped until the cause of nonzero horizontal advection if revealed." | |||
) | |||
|
|||
if ntracer == 1: | |||
# Skip benchmarks for this experiment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this currently always the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we pass the benchmark fixture by default, if we do not set this condition , then it will benchmark the test for multiple parametrizations. By setting this, it benchmarks only the case with 4 tracers. But this is just a suggestion from my side. If want otherwise, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this file test_helpers.py
. It is not always like that in icon4py
but the tests for something in a module foo.py
should go into test_foo.py
for easier overview.
), "Base values should not be equal. Otherwise, the test did not go through incr_func/ verify_field functions." | ||
|
||
# Expect AssertionError | ||
with pytest.raises(AssertionError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this test. Why should it raise? the function under test run_verify_and_benchmark
does not raise, does it? And shouldn't there be a test for the case that the benchmark si called, and one for the case that it isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It raises an error because of this:
functools.partial(
verify_field, field=field, increment=increment, base_value=base_value
), # base_value should be current_base_value
i.e. to be correct we need base_value= current_base_value
. It is there just to test if run_verify_and_benchmark
raises errors correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding your comment about testing the benchmark branch, we implicitly test it with the mock object benchmark = mock.Mock(enabled=False)
, i.e. we pass the benchmark fixture (not None), but with the enabled
parameter to False.
FYI @egparedes @halungge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still hard to understand what you want to test here. You can use mocks not only for the benchmark but also for the functions, since all you want check is whether they are called or not. I would do something like that:
@pytest.mark.parametrize("benchmark_enabled", [True, False])
def test_verification_benchmarking_infrastructure(benchmark_enabled):
base_value = 1
field = np.array((base_value * np.ones((), dtype=BASE_DTYPE)))
increment = 6
benchmark = mock.Mock(enabled=benchmark_enabled)
test_func = mock.Mock(wraps=incr_func)
verification_func = mock.Mock(wraps=verify_field)
helpers.run_verify_and_benchmark(
functools.partial(test_func, field=field, increment=increment),
functools.partial(verification_func, field=field, increment=increment, base_value=base_value),
benchmark_fixture=benchmark,
)
test_func.assert_called_once()
verification_func.assert_called_once()
benchmark.assert_called() if benchmark_enabled else benchmark.assert_not_called()
(using your functions, you could even use empty functions). I think that tests all variants except for the benchmark = None
, which I don't know whether you can mock that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage of the current solution, is that it tests all branches of the helper function and it provides a concrete example of how the helper function is used (with no overhead compared to the mock objects).
I would prefer to leave it as is, but if you prefer otherwise I can change it.
@egparedes any comment on the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the current version tests all branches, if it does it does it in a very obfuscated way imho. It took me way too long to understand what you were trying to test....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fixed @halungge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final changes. After fixing these changes it should be fine for me so I don't need another round and I'll leave the final approval to @halungge
|
||
input_data = allocate_data(backend, input_data) | ||
Note: | ||
- test_func and verification_func should be provided with binded arguments, i.e. with functools.partial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- test_func and verification_func should be provided with binded arguments, i.e. with functools.partial. | |
- test_func and verification_func should be provided with bound arguments, i.e. with functools.partial. |
model/testing/tests/test_helpers.py
Outdated
|
||
|
||
@pytest.mark.parametrize("benchmark_enabled", [True, False]) | ||
def test_verification_benchmarking_infrastructure(benchmark_enabled): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_verification_benchmarking_infrastructure(benchmark_enabled): | |
def test_run_verify_and_benchmark(benchmark_enabled): |
model/testing/tests/test_helpers.py
Outdated
failing_verification_func = mock.Mock(side_effect=AssertionError("Verification failed.")) | ||
with pytest.raises(AssertionError): | ||
helpers.run_verify_and_benchmark( | ||
functools.partial(test_func, req_arg=mock.Mock()), | ||
failing_verification_func, | ||
benchmark_fixture=None, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really testing anything new because we already tested earlier that verification_func
is always called.
However there is still one test case missing: calling the function with benchmark_fixture = None
which should still call the verification function.
It should be very simple moving this part to a new function with fresh mocks so we can simply assert:
failing_verification_func = mock.Mock(side_effect=AssertionError("Verification failed.")) | |
with pytest.raises(AssertionError): | |
helpers.run_verify_and_benchmark( | |
functools.partial(test_func, req_arg=mock.Mock()), | |
failing_verification_func, | |
benchmark_fixture=None, | |
) | |
helpers.run_verify_and_benchmark( | |
functools.partial(test_func, req_arg=mock.Mock()), | |
verification_func, | |
benchmark_fixture=None, | |
) | |
test_func.assert_called_once() | |
verification_func.assert_called_once() |
Co-authored-by: Enrique González Paredes <[email protected]>
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
In case your change might affect downstream icon-exclaim, please consider running
For more detailed information please look at CI in the EXCLAIM universe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can you add a PR description? It will be taken as the merge commit comment.
@kotsaloscv you forgot to add the PR description before merging. Could you add it now so it would be at least visible following the PR number in the commit log? |
Indeed, my bad ;) Thanks for reminding me. |
* main: (39 commits) Combining solve nonhydro stencils 1 to 13 (#682) Update gt4py to icon4py_staging_20250507 (#736) add test level marker (#722) Fix refactor diffusion metrics (#715) Re-add conftest again (#734) Separate datatests (#714) Rename tests (#733) Update to icon4py_staging_20250502 (#732) Simplified velocity advection stencil (#725) add doc string on the combined stencils on velocity advection (#707) Updated Benchmarking Infrastructure (#703) Vectorize compute vwind impl wgt (#721) Combining solve nonhydro stencils 14 to 28 (#292) Cleanup metrics fields and factory tests (cont) (#690) Disable tracer-advection tests (#720) remove unused arguments (concat_where) (#716) Update GT4Py version to icon4py_staging_20250417 (#717) enable interpolation_factory tests on GPU (#681) Fix fix dataset -> datatest typo and strict checking (#713) Reenable graupel tests (#712) ...
Update the benchmarking infrastructure and extend it for the granule tests.
The
helpers.run_verify_and_benchmark
function runs, verifies and benchmarks (if selected) the investigated function, either a stencil test or a granule run. Therefore, a benchmark guarantees that the function is always verified. Additionally, the infrastructure wraps in a very compact form the running, the testing and the benchmarking (just one function).