Benchmarking framework: Updated #703

kotsaloscv · 2025-03-25T09:44:23Z

Update the benchmarking infrastructure and extend it for the granule tests.

The helpers.run_verify_and_benchmark function runs, verifies and benchmarks (if selected) the investigated function, either a stencil test or a granule run. Therefore, a benchmark guarantees that the function is always verified. Additionally, the infrastructure wraps in a very compact form the running, the testing and the benchmarking (just one function).

kotsaloscv · 2025-04-08T06:19:06Z

@egparedes any feedback on these changes?
Thanks!

egparedes

I like the general cleanup of the test logic, but there are a couple of issues to fix.

egparedes · 2025-04-08T07:15:45Z

model/atmosphere/diffusion/tests/diffusion_tests/test_diffusion.py

 ):
    if orchestration and not helpers.is_dace(backend):
        pytest.skip("Orchestration test requires a dace backend.")
+
+    if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT:


This will skip the validation test for the regional experiment if the benchmark is enabled, which I don't think is the right behavior. See next comment below.

This is issue hasn't been addressed yet. One solution could be to make the benchmark argument optional in the run_verify_and_benchmark helper and then do something like:

Suggested change

if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT:

if experiment == dt_utils.REGIONAL_EXPERIMENT:

# Skip benchmarks for this experiment

benchmark = None

In this case the helper would always verify the function but would only run the benchmark if the benchmark fixture is passed and enabled.

model/atmosphere/diffusion/tests/diffusion_tests/test_diffusion.py

model/testing/src/icon4py/model/testing/helpers.py

model/testing/src/icon4py/model/testing/pytest_config.py

egparedes

I like the current approach and have a couple of suggestions to improve it further.

model/atmosphere/diffusion/tests/diffusion_tests/test_diffusion.py

model/testing/src/icon4py/model/testing/helpers.py

egparedes

Minor style-related comments. Additionally, it would be good to add some tests for the most critical/complex functions here.

model/testing/src/icon4py/model/testing/helpers.py

egparedes · 2025-04-10T09:30:05Z

model/testing/src/icon4py/model/testing/helpers.py

+    input_data: dict,
+    reference_outputs,


Missing and too generic type hints.

egparedes · 2025-04-10T09:30:42Z

model/testing/src/icon4py/model/testing/helpers.py

+    connectivities_as_numpy: dict,
+    input_data: dict,
+    benchmark,  # benchmark fixture
+):


Missing and too generic type hints.

kotsaloscv · 2025-04-10T14:24:38Z

Minor style-related comments. Additionally, it would be good to add some tests for the most critical/complex functions here.

FYI, @egparedes

Once you are happy with the verification/benchmarking infrastructure, I will add more granule tests (I just wanted to stabilize it first and then simply apply it to other tests -easy part ;) -)

egparedes · 2025-04-10T15:47:53Z

FYI, @egparedes

Once you are happy with the verification/benchmarking infrastructure, I will add more granule tests (I just wanted to stabilize it first and then simply apply it to other tests -easy part ;) -)

I'm sorry if my comment about adding tests was misleading, but I meant to add unit tests for the benchmarking infrastructure enhanced in this PR.

…ture-

kotsaloscv · 2025-04-11T08:15:37Z

FYI, @egparedes
Once you are happy with the verification/benchmarking infrastructure, I will add more granule tests (I just wanted to stabilize it first and then simply apply it to other tests -easy part ;) -)

I'm sorry if my comment about adding tests was misleading, but I meant to add unit tests for the benchmarking infrastructure enhanced in this PR.

How about now?

egparedes

More comments.

egparedes · 2025-04-11T12:01:07Z

model/testing/src/icon4py/model/testing/helpers.py

+    benchmark,  # benchmark fixture
+):


Still missing type hints

egparedes · 2025-04-11T12:05:26Z

model/atmosphere/diffusion/tests/diffusion_tests/test_diffusion.py

 ):
    if orchestration and not helpers.is_dace(backend):
        pytest.skip("Orchestration test requires a dace backend.")
+
+    if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT:


This is issue hasn't been addressed yet. One solution could be to make the benchmark argument optional in the run_verify_and_benchmark helper and then do something like:

Suggested change

if benchmark.enabled and experiment == dt_utils.REGIONAL_EXPERIMENT:

if experiment == dt_utils.REGIONAL_EXPERIMENT:

# Skip benchmarks for this experiment

benchmark = None

In this case the helper would always verify the function but would only run the benchmark if the benchmark fixture is passed and enabled.

egparedes · 2025-04-11T12:17:55Z

model/testing/tests/test_verification_benchmarking.py

+    all_correct = np.all(field == (base_value + increment))
+
+    assert type(all_correct) is np.bool_
+    assert all_correct, f"Field verification failed"


Suggested change

all_correct = np.all(field == (base_value + increment))

assert type(all_correct) is np.bool_

assert all_correct, f"Field verification failed"

np.testing.assert_all_close(field == (base_value + increment))

egparedes · 2025-04-11T12:20:38Z

model/testing/tests/test_verification_benchmarking.py

+
+def test_verification_benchmarking_infrastructure(benchmark):
+    base_value = 1
+    field = base_value*np.ones((1000, 1000), dtype=base_type)


No need to use an array, a simple 0d field should be ok for this function

Suggested change

field = base_value*np.ones((1000, 1000), dtype=base_type)

field = base_value*np.ones((), dtype=base_type)

egparedes · 2025-04-11T12:22:21Z

model/testing/tests/test_verification_benchmarking.py

+    incr_func(field, increment)
+    verify_field(field, increment, current_base_value)


What is this testing?

Just to make sure that python passes correctly through the run_verify_and_benchmark function and increments the field. Therefore, the second time that we call these functions, the base value is already altered.

egparedes · 2025-04-11T12:22:41Z

model/testing/tests/test_verification_benchmarking.py

+        benchmark,
+    )
+
+    current_base_value = np.random.choice(field.flat)


For 0d fields:

Suggested change

current_base_value = np.random.choice(field.flat)

current_base_value = field[()]

egparedes · 2025-04-11T12:25:20Z

model/testing/tests/test_verification_benchmarking.py

+
+from icon4py.model.testing import helpers
+
+base_type = np.int64


If this is a constant, use the proper naming conventions (BASE_TYPE), but I don't see the need for it in such a small example. Additionally, in numpy world, this is almost always called dtype not type

egparedes · 2025-04-11T12:29:23Z

model/testing/tests/test_verification_benchmarking.py

+    helpers.run_verify_and_benchmark(
+        functools.partial(incr_func, field=field, increment=increment),
+        functools.partial(verify_field, field=field, increment=increment, base_value=base_value),
+        benchmark,
+    )


This is only testing the happy path where everything works, but it's not testing that the verify fails appropriately when needed. Also, we shouldn't request and use the actual benchmark fixture here to avoid messing up with the results of the benchmarks. Just use a mock, setting the enabled attribute to both True and False to check that the dispatching logic is ok and only calls the benchmarking function when needed.

FYI @egparedes
I now fixed the testing of the non-happy path as well.
Let me know if that's ok.

kotsaloscv · 2025-04-16T08:26:25Z

@halungge do you think that we can add more granule tests for benchmarking, other than the ones that I have already included? Thanks!

egparedes

Only small things from previous rounds which have not been yet addressed.

egparedes · 2025-04-16T12:50:22Z

model/testing/tests/test_verification_benchmarking.py

+    np.testing.assert_allclose(field, base_value + increment)
+
+
+def test_verification_benchmarking_infrastructure(benchmark: Optional[pytest.FixtureRequest]):


The benchmark fixture doesn't need to be used here, so it shouldn't be requested. Please remove it.

egparedes · 2025-04-16T13:03:00Z

model/testing/tests/test_verification_benchmarking.py

+    helpers.run_verify_and_benchmark(
+        functools.partial(incr_func, field=field, increment=increment),
+        functools.partial(verify_field, field=field, increment=increment, base_value=base_value),
+        benchmark_fixture=None,  # no need to benchmark this test
+    )


As I said in a previous review, instead of passing the real benchmark fixture you should pass a mock (please read the documentation at https://devdocs.io/python~3.13/library/unittest.mock#unittest.mock.Mock) here with the enabled attribute set to True/False and then assert if it has been called or not. Something like:

Suggested change

helpers.run_verify_and_benchmark(

functools.partial(incr_func, field=field, increment=increment),

functools.partial(verify_field, field=field, increment=increment, base_value=base_value),

benchmark_fixture=None, # no need to benchmark this test

)

benchmark = mock.Mock(enabled=False)

helpers.run_verify_and_benchmark(

functools.partial(incr_func, field=field, increment=increment),

functools.partial(verify_field, field=field, increment=increment, base_value=base_value),

benchmark_fixture=benchmark,

)

assert not benchmark.called

benchmark.enabled = True

helpers.run_verify_and_benchmark(

functools.partial(incr_func, field=field, increment=increment),

functools.partial(verify_field, field=field, increment=increment, base_value=base_value),

benchmark_fixture=benchmark,

)

assert benchmark.called

This is only an example to show how to use it, don't take it as a real suggestion. Try to fit something like that in the test in a way that makes sense and tests all relevant code paths.

egparedes · 2025-04-16T13:03:42Z

model/testing/src/icon4py/model/testing/helpers.py

+    connectivities_as_numpy: dict[str, np.ndarray],
+    input_data: dict[str, gtx.Field],
+    benchmark: pytest.FixtureRequest,
+):


Missing return type hint

egparedes · 2025-04-16T13:04:32Z

model/testing/src/icon4py/model/testing/helpers.py

@@ -199,8 +211,7 @@ def __init_subclass__(cls, **kwargs):
        # reflect the name of the test we do this dynamically here instead of using regular
        # inheritance.
        super().__init_subclass__(**kwargs)
-        setattr(cls, f"test_{cls.__name__}", _test_validation)
-        setattr(cls, f"test_{cls.__name__}_benchmark", _test_execution_benchmark)
+        setattr(cls, f"test_{cls.__name__}", _test_and_benchmark)


 def reshape(arr: np.ndarray, shape: tuple[int, ...]):


Not related to your PR, but since you're touching already this file, maybe you could the proper return type hint here.

halungge · 2025-04-17T13:23:50Z

model/atmosphere/advection/tests/advection_tests/test_advection.py

@@ -103,6 +106,11 @@ def test_advection_run_single_step(
        pytest.xfail(
            "This test is skipped until the cause of nonzero horizontal advection if revealed."
        )
+
+    if ntracer == 1:
+        # Skip benchmarks for this experiment


Isn't this currently always the case?

Since we pass the benchmark fixture by default, if we do not set this condition , then it will benchmark the test for multiple parametrizations. By setting this, it benchmarks only the case with 4 tracers. But this is just a suggestion from my side. If want otherwise, please let me know.

model/atmosphere/diffusion/tests/diffusion_tests/test_diffusion.py

halungge · 2025-04-22T07:41:08Z

model/testing/tests/test_verification_benchmarking.py

I would call this file test_helpers.py. It is not always like that in icon4py but the tests for something in a module foo.py should go into test_foo.py for easier overview.

halungge · 2025-04-22T07:42:38Z

model/testing/tests/test_verification_benchmarking.py

+    ), "Base values should not be equal. Otherwise, the test did not go through incr_func/ verify_field functions."
+
+    # Expect AssertionError
+    with pytest.raises(AssertionError):


I don't really understand this test. Why should it raise? the function under test run_verify_and_benchmark does not raise, does it? And shouldn't there be a test for the case that the benchmark si called, and one for the case that it isn't?

It raises an error because of this:

functools.partial( verify_field, field=field, increment=increment, base_value=base_value ), # base_value should be current_base_value

i.e. to be correct we need base_value= current_base_value. It is there just to test if run_verify_and_benchmark raises errors correctly.

Regarding your comment about testing the benchmark branch, we implicitly test it with the mock object benchmark = mock.Mock(enabled=False), i.e. we pass the benchmark fixture (not None), but with the enabled parameter to False.

kotsaloscv · 2025-04-22T09:04:35Z

FYI @egparedes @halungge
is it ok now?
If yes, we can merge it.

model/testing/src/icon4py/model/testing/helpers.py

halungge · 2025-04-23T13:10:31Z

model/testing/tests/test_helpers.py

It is still hard to understand what you want to test here. You can use mocks not only for the benchmark but also for the functions, since all you want check is whether they are called or not. I would do something like that:

@pytest.mark.parametrize("benchmark_enabled", [True, False]) def test_verification_benchmarking_infrastructure(benchmark_enabled): base_value = 1 field = np.array((base_value * np.ones((), dtype=BASE_DTYPE))) increment = 6 benchmark = mock.Mock(enabled=benchmark_enabled) test_func = mock.Mock(wraps=incr_func) verification_func = mock.Mock(wraps=verify_field) helpers.run_verify_and_benchmark( functools.partial(test_func, field=field, increment=increment), functools.partial(verification_func, field=field, increment=increment, base_value=base_value), benchmark_fixture=benchmark, ) test_func.assert_called_once() verification_func.assert_called_once() benchmark.assert_called() if benchmark_enabled else benchmark.assert_not_called()

(using your functions, you could even use empty functions). I think that tests all variants except for the benchmark = None, which I don't know whether you can mock that

The advantage of the current solution, is that it tests all branches of the helper function and it provides a concrete example of how the helper function is used (with no overhead compared to the mock objects).
I would prefer to leave it as is, but if you prefer otherwise I can change it.
@egparedes any comment on the test?

Not the current version tests all branches, if it does it does it in a very obfuscated way imho. It took me way too long to understand what you were trying to test....

ok fixed @halungge

egparedes

Final changes. After fixing these changes it should be fine for me so I don't need another round and I'll leave the final approval to @halungge

egparedes · 2025-04-25T09:34:13Z

model/testing/src/icon4py/model/testing/helpers.py


-    input_data = allocate_data(backend, input_data)
+    Note:
+        - test_func and verification_func should be provided with binded arguments, i.e. with functools.partial.


Suggested change

- test_func and verification_func should be provided with binded arguments, i.e. with functools.partial.

- test_func and verification_func should be provided with bound arguments, i.e. with functools.partial.

egparedes · 2025-04-25T09:35:54Z

model/testing/tests/test_helpers.py

+
+
+@pytest.mark.parametrize("benchmark_enabled", [True, False])
+def test_verification_benchmarking_infrastructure(benchmark_enabled):


Suggested change

def test_verification_benchmarking_infrastructure(benchmark_enabled):

def test_run_verify_and_benchmark(benchmark_enabled):

egparedes · 2025-04-25T09:48:36Z

model/testing/tests/test_helpers.py

+    failing_verification_func = mock.Mock(side_effect=AssertionError("Verification failed."))
+    with pytest.raises(AssertionError):
+        helpers.run_verify_and_benchmark(
+            functools.partial(test_func, req_arg=mock.Mock()),
+            failing_verification_func,
+            benchmark_fixture=None,
+        )


This is not really testing anything new because we already tested earlier that verification_func is always called.

However there is still one test case missing: calling the function with benchmark_fixture = None which should still call the verification function.

It should be very simple moving this part to a new function with fresh mocks so we can simply assert:

Suggested change

failing_verification_func = mock.Mock(side_effect=AssertionError("Verification failed."))

with pytest.raises(AssertionError):

helpers.run_verify_and_benchmark(

functools.partial(test_func, req_arg=mock.Mock()),

failing_verification_func,

benchmark_fixture=None,

)

helpers.run_verify_and_benchmark(

functools.partial(test_func, req_arg=mock.Mock()),

verification_func,

benchmark_fixture=None,

)

test_func.assert_called_once()

verification_func.assert_called_once()

model/testing/tests/test_helpers.py

Co-authored-by: Enrique González Paredes <[email protected]>

github-actions · 2025-04-25T11:43:14Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run default
launch jenkins spack

Optional Tests

To run benchmarks you can use:

cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

cscs-ci run dace

In case your change might affect downstream icon-exclaim, please consider running

launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

halungge

LGTM, can you add a PR description? It will be taken as the merge commit comment.

egparedes · 2025-04-25T16:17:15Z

@kotsaloscv you forgot to add the PR description before merging. Could you add it now so it would be at least visible following the PR number in the commit log?

kotsaloscv · 2025-04-30T06:20:24Z

@kotsaloscv you forgot to add the PR description before merging. Could you add it now so it would be at least visible following the PR number in the commit log?

Indeed, my bad ;) Thanks for reminding me.

* main: (39 commits) Combining solve nonhydro stencils 1 to 13 (#682) Update gt4py to icon4py_staging_20250507 (#736) add test level marker (#722) Fix refactor diffusion metrics (#715) Re-add conftest again (#734) Separate datatests (#714) Rename tests (#733) Update to icon4py_staging_20250502 (#732) Simplified velocity advection stencil (#725) add doc string on the combined stencils on velocity advection (#707) Updated Benchmarking Infrastructure (#703) Vectorize compute vwind impl wgt (#721) Combining solve nonhydro stencils 14 to 28 (#292) Cleanup metrics fields and factory tests (cont) (#690) Disable tracer-advection tests (#720) remove unused arguments (concat_where) (#716) Update GT4Py version to icon4py_staging_20250417 (#717) enable interpolation_factory tests on GPU (#681) Fix fix dataset -> datatest typo and strict checking (#713) Reenable graupel tests (#712) ...

Add more benchmark tests

7709c5d

kotsaloscv self-assigned this Mar 25, 2025

kotsaloscv requested review from havogt and egparedes March 26, 2025 08:57

kotsaloscv and others added 2 commits April 7, 2025 09:13

Merge branch 'main' into benchmark

c1a7dff

Enable/Disable Benchmark

36f7148

kotsaloscv marked this pull request as ready for review April 8, 2025 06:18

egparedes requested changes Apr 8, 2025

View reviewed changes

Benchmarking framework: WIP

a8b54d7

kotsaloscv requested a review from egparedes April 9, 2025 11:13

egparedes reviewed Apr 9, 2025

View reviewed changes

Benchmarking framework: WIP -validation function with binded args-

b5bc370

kotsaloscv requested a review from egparedes April 10, 2025 06:45

Benchmarking framework: WIP -binded args for simplicity-

66b9c3d

egparedes reviewed Apr 10, 2025

View reviewed changes

Benchmarking framework: WIP -type annotations-

4d7ce4c

kotsaloscv requested a review from egparedes April 10, 2025 14:24

Benchmarking framework: WIP -add test for the benchmarking infrastruc…

f769a92

…ture-

egparedes reviewed Apr 11, 2025

View reviewed changes

Benchmarking framework: WIP -address comments on pytest-

e40cc93

kotsaloscv requested a review from egparedes April 14, 2025 14:42

kotsaloscv added 3 commits April 15, 2025 10:57

Benchmarking framework: WIP -address comments on pytest-

81d7560

Benchmarking framework: merge main into this branch

622cf95

Benchmarking framework: add more granule tests

3354c34

kotsaloscv requested a review from halungge April 16, 2025 08:25

egparedes reviewed Apr 16, 2025

View reviewed changes

Benchmarking framework: fix details

7246210

kotsaloscv requested a review from egparedes April 16, 2025 13:32

halungge requested changes Apr 22, 2025

View reviewed changes

kotsaloscv and others added 3 commits April 22, 2025 10:18

Merge branch 'main' into benchmark

70a72d2

test renaming

1e55196

Benchmarking framework

f0dd283

edopao reviewed Apr 23, 2025

View reviewed changes

model/testing/src/icon4py/model/testing/helpers.py Show resolved Hide resolved

Merge branch 'main' into benchmark

963e9b2

halungge reviewed Apr 23, 2025

View reviewed changes

kotsaloscv and others added 4 commits April 25, 2025 09:44

Merge branch 'main' into benchmark

7667694

Benchmarking framework: test with mock functions only

da62729

Benchmarking framework: test with mock functions only

a636c43

Benchmarking framework: test with mock functions only

9c2fa35

egparedes reviewed Apr 25, 2025

View reviewed changes

Benchmarking framework: test modification

a4c2b79

egparedes reviewed Apr 25, 2025

View reviewed changes

model/testing/tests/test_helpers.py Outdated Show resolved Hide resolved

kotsaloscv and others added 2 commits April 25, 2025 13:42

Update model/testing/tests/test_helpers.py

21343fa

Co-authored-by: Enrique González Paredes <[email protected]>

Merge branch 'main' into benchmark

a6ed1ee

halungge approved these changes Apr 25, 2025

View reviewed changes

kotsaloscv merged commit 2f97bc1 into main Apr 25, 2025
2 checks passed

kotsaloscv deleted the benchmark branch April 25, 2025 11:47

kotsaloscv changed the title ~~Add more benchmark tests~~ Benchmarking framework: Updated Apr 30, 2025

	field = base_value*np.ones((1000, 1000), dtype=base_type)
	field = base_value*np.ones((), dtype=base_type)

		incr_func(field, increment)
		verify_field(field, increment, current_base_value)

	current_base_value = np.random.choice(field.flat)
	current_base_value = field[()]


		from icon4py.model.testing import helpers

		base_type = np.int64

		np.testing.assert_allclose(field, base_value + increment)


		def test_verification_benchmarking_infrastructure(benchmark: Optional[pytest.FixtureRequest]):

	- test_func and verification_func should be provided with binded arguments, i.e. with functools.partial.
	- test_func and verification_func should be provided with bound arguments, i.e. with functools.partial.



		@pytest.mark.parametrize("benchmark_enabled", [True, False])
		def test_verification_benchmarking_infrastructure(benchmark_enabled):

	def test_verification_benchmarking_infrastructure(benchmark_enabled):
	def test_run_verify_and_benchmark(benchmark_enabled):

Benchmarking framework: Updated #703

Benchmarking framework: Updated #703

Uh oh!

Conversation

kotsaloscv commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kotsaloscv commented Apr 8, 2025

Uh oh!

egparedes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egparedes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egparedes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotsaloscv commented Apr 10, 2025

Uh oh!

egparedes commented Apr 10, 2025

Uh oh!

kotsaloscv commented Apr 11, 2025

Uh oh!

egparedes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotsaloscv commented Apr 16, 2025

Uh oh!

egparedes left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotsaloscv commented Mar 25, 2025 •

edited

Loading

egparedes left a comment •

edited

Loading

halungge Apr 23, 2025 •

edited

Loading