Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to document current behavior of post-instr in branches #435

Closed
wants to merge 1 commit into from

Conversation

mbarbin
Copy link
Contributor

@mbarbin mbarbin commented Feb 12, 2024

This PR adds new test cases to monitor the current behavior of branch
instrumentation in Bisect_ppx. The test cases focus on complex scenarios
involving if-then-else and match branches discussed in #434.

The aim is not to modify the existing behavior directly, but to establish a
baseline for future changes.

By adding these tests, we can monitor how future changes to Bisect_ppx may alter
these behaviors (TDD).


Where there's a raise or a failwith guarded by a conditional, the
branch itself is instrumented, but the raising expression is not
post-instrumented...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the incremental value of having this test over the tests in apply/special.t, including

> let _ = failwith (print_endline "foo"; "bar")
? Is there any need to test this in interaction with if-expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there's a need for this test from the standpoint of the current
implementation. I do see its value in terms of future-proofing and documenting.

The if cond then raise pattern is quite common. As it stands, one must be
cautious with how the raising expression is written. It's important to note that
even seemingly innocuous refactoring, which shouldn't affect coverage, could
potentially introduce non-visitable coverage points. This can be quite
surprising.

Consider the following example:

-| if cond then raise Not_found
+| if cond then raise_s [%sexp "Key not found", { key : Key.t}]

This results in the insertion of non-visitable point 1:

-| if cond then raise Not_found
+| if cond then (__bisect_post__visit 1 (raise_s [%sexp "Key not found", { key : Key.t}]))
``

For example, if you use `invalid_arg`, or a raising helper, its
application will be post-instrumented (unless all of its arguments are
labeled !?), resulting in a code location that can never be visited,
by design. These are currently acknowledged limitations.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an acknowledged and inevitable limitation because there is no way for Bisect to know that a specific identifier refers to a function that always or never raises. In fact, it doesn't know that for the identifiers failwith or invalid_arg either, as those can be redefined and shadowed by the user, including by another module or a third-party library that is not even processed by Bisect during compilation of the user's particular project. I don't see that we need a test for this -- the actual behavior is covered by the normal apply tests.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Raises or diverges)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no way for Bisect to know that a specific identifier refers to a
function that always or never raises

I agree. It's fair to say that "a strategy to limit non-visitable points, which
relies partly on identifiers, will inevitably encounter these limitations."
However, if we were to discover an alternative approach (assuming one exists),
we might still see improvements in this test. In other words, the insertion of
post-instrumentation might not be inevitable when considering the range of
possible instrumentation strategies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Raises or diverges)

Btw, I didn't understand the part where you said "or diverges".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Raises or diverges)

Btw, I didn't understand the part where you said "or diverges".

Post-instrumentation checks that a function evaluates to a value. The two ways that it doesn't in OCaml are when it raises an exception or when it diverges.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the two ways that the out-edge of aPexp_apply may not be followed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "diverging" encompass non-termination and potentially the effects in OCaml 5? Something else?

Regarding the effects, I'm optimistic that their influence on the strategy will
be similar to that of exceptions. I hope this won't necessitate too big of an
exploration of the OCaml 5 runtime 🤯. Thank you again for your explanation!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "diverging" is non-termination. OCaml 5 effects are supposed to always be returned from exactly once, so they should not affect control flow as visible to Bisect, although this is not enforced, and for effects that correspond to async computation the "exactly once" might be later than program termination, so in that sense they behave like exceptions.

If you try and disable post-instrumentation with a `coverage off`
directive, the branch pre-instrumentation is removed as well,
resulting in losing the ability to control that the branch itself is
visited.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to attribute.t? That is the place for testing [@coverage off]. It already includes cases for interaction of [@coverage off] with multiple language constructs in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your suggestion to add tests to attribute.t, which I had
overlooked. I will proceed to incorporate new tests there to demonstrate how to
disable coverage in branches (same PR ok?).

Despite the presence of branch coverage testing in attribute.t, I advocate for
retaining this specific part of the control test here. It serves to illustrate
that this method does not provide a viable workaround for the issue at hand.
This was in response to your comment in #433:

I believe in your case you should suppress instrumentation of the out-edge
with [@coverage off], though I don't immediately recall whether the way that
is implemented will also cause the branch instrumentation to also be
supressed.

This might come across more as literate testing (akin to a tutorial, perhaps?)
rather than a pure behavior test. I've attempted to construct a small narrative
around the added tests. However, if you have strong reservations, I'm open to
removing this section entirely.

@@ -65,3 +65,136 @@ cases are in tail position iff the match expression is in tail position.
| () ->
___bisect_visit___ 2;
print_endline "bar"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analogous questions for the match.t changes as for the if.t changes.

@mbarbin
Copy link
Contributor Author

mbarbin commented Feb 13, 2024

Thank you for taking the time to review the PR and for your comments! I really
appreciate your detailed feedback and am working on responding to each of your
points individually.

Overall, my intuition centers around the hypothesis that the
post-instrumentation for branches would be better attributed to the expression
that encompasses the entire control structure, and doing so recursively if the
expression itself is a branch of a further enclosing control structure.

In other words, the post-instrumentation would never be produced for each case
individually, also simplifying the current special cases that exists there. This
might necessitate some work at the outer level (such as adding (post or
pre)-instrumentation), although I haven't fully worked out the details yet.

I have the hunch that doing something along these lines would avoid, in more
cases, the insertion of coverage points that are not visitable by design,
without requiring annotations or otherwise compromising on the quality of the
coverage testing for the branches themselves.

In this context, yes I indeed believe in the need for dedicated tests to ensure
coverage for these specific scenario. The "tests" I propose to add are not
traditional tests aimed at preserving a specific behavior. Instead, they serve
as precise documentation for the current behavior, while also providing coverage
for a behavior I am hoping will be modified in future work. My hope is that,
over time, the behavior for control structures will begin to diverge from the
general case, and be reflected in these tests (TDD).

In #434, you said:

The post-instrumentation of application expressions is orthogonal to whether
they are in if expressions.

I agree. I think this is part of what I would like to challenge, for the
terminal expressions of each branches when they are apply nodes.

I acknowledge that I haven't fully realized the details of the concept yet, and
I understand your reluctance to distinguish if-then-else or match from
general cases if you do not buy in the hope to improve behavior there.

As a compelling argument in favor of these specific tests, examining the
processing of these control structures has significantly enhanced my
understanding of the current behavior and constraints of the instrumentation.
This newfound insight has ignited my creativity in devising ways to circumvent
these limitations in my project, and continue increasing reported coverage.

Perhaps you find the cost of adding such tests outweighs the potential benefits,
considering the risk that such improvements might not be feasible? If you prefer
we can wait to see more progress on an overall proposal before deciding on
whether to pursue merging these?

@aantron
Copy link
Owner

aantron commented Feb 13, 2024

Perhaps you find the cost of adding such tests outweighs the potential benefits,
considering the risk that such improvements might not be feasible? If you prefer
we can wait to see more progress on an overall proposal before deciding on
whether to pursue merging these?

I would suggest to keep the tests that are redundant right now locally, and to include them as part of the PR if you do suggest an alternative instrumentation strategy for if+Pexp_apply. Normally when I review such PRs, I manually run the new tests with the base branch to observe the difference with the PR, so I will be able to see the difference. That is, if I don't spot it by inspection, which is quite likely in this case, I think.

@aantron
Copy link
Owner

aantron commented Feb 13, 2024

Consider the following example:

-| if cond then raise Not_found
+| if cond then raise_s [%sexp "Key not found", { key : Key.t}]

This results in the insertion of non-visitable point 1:

-| if cond then raise Not_found
+| if cond then (__bisect_post__visit 1 (raise_s [%sexp "Key not found", { key : Key.t}]))
``

It may be easier to address this by extending the --exclusions option to allow the user to define a list of identifiers to exclude from coverage as if they were built-ins when they appear in Pexp_apply.

@mbarbin
Copy link
Contributor Author

mbarbin commented Feb 13, 2024

I would suggest to keep the tests that are redundant right now locally, and to include them as part of the PR if you do suggest an alternative instrumentation strategy for if+Pexp_apply. Normally when I review such PRs, I manually run the new tests with the base branch to observe the difference with the PR, so I will be able to see the difference. That is, if I don't spot it by inspection, which is quite likely in this case, I think.

Understood. I am more accustomed to first publishing tests and then reviewing
changes in subsequent updates. However, I'm more than willing to adapt as
necessary. I'll transition the PR to the draft stage as I dedicate more time to
the strategy. Thank you!

@mbarbin mbarbin marked this pull request as draft February 13, 2024 14:33
@aantron
Copy link
Owner

aantron commented Feb 13, 2024

Understood. I am more accustomed to first publishing tests and then reviewing
changes in subsequent updates. However, I'm more than willing to adapt as
necessary. I'll transition the PR to the draft stage as I dedicate more time to
the strategy. Thank you!

Another way to do this is to open the PR with the tests in the first commit and the other changes afterwards, with the changes in test output included in the later commits. I suppose you are potentially turning this PR into that right now :) Thank you.

@mbarbin
Copy link
Contributor Author

mbarbin commented Nov 18, 2024

Hi @aantron . I hope all is well with you! I was doing a little pass over my opened PRs and decided it was time to break radio silence on this one 😃 sorry for not being more active on it.

By the way, I wanted to correct something in my previous comments : a friend of mine pointed out that my use of the term TDD was confusing for what I was trying to achieve, and showed me the wiki page for Characterization_test, which I didn't know about before. I think this more closely resemble the approach I wanted to take with bisect_ppx.

I have been using it for a while now, and I am still confused about the intention for the current instrumentation strategy in most cases for out-edge instrumentation. In my experience, the instrumentation frequently either adds unvisitable-by-design coverage points, or forces you to write the code in such a way that the raising calls are in tail position, which oftentimes feels arbitrary for the reviewer who is not aware that this is motivated by really non-trivial characteristics of the coverage tool.

Do you know if bisect_ppx was designed to be a coverage tool from the beginning, or was it a code base that was turned into one at a later point?

At any rate, I don't feel I understand enough the current behavior to be proposing actual changes. I need first to have some way to monitor the current behavior. That being said, I understand from our exchange that this may not be something that has its place directly within this repository. So I thought I would try and develop a little collection of characterization tests, in my own time, in an external repo. Maybe then I can revisit with a better understanding, or simply try and adjust my uses by following how my characterization tests are affected by upgrades to new versions of bisect_ppx.

The other thing that doesn't make me very comfortable is that I don't understand why no other user is reporting these limitations. I am guessing that the tools isn't really used a lot, or perhaps in such a way that people don't pay close attention to these unvisitable-points. I'm happy to wait and see which direction is the community going to go and how the coverage part of the roadmap https://ocaml.org/tools/platform-roadmap#w21-run-tests will actually be implemented. I can revisit when more people are on board. Thanks a lot!

@mbarbin mbarbin closed this Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants