[experimental] Run crosshair in CI #4034

Zac-HD · 2024-07-07T18:06:24Z

To reproduce this locally, you can run make check-crosshair-cover/nocover/niche for the same command as in CI, but I'd recommend pytest --hypothesis-profile=crosshair hypothesis-python/tests/{cover,nocover,datetime} -m xf_crosshair --runxfail to select and run only the xfailed tests.

Hypothesis' problems

Probably Crosshair's problems

Error in `operator.eq(Decimal('sNaN'), an_int)`

____ test_rewriting_does_not_compare_decimal_snan ____
  File "hypothesis/strategies/_internal/strategies.py", line 1017, in do_filtered_draw
    if self.condition(value):
TypeError: argument must be an integer
while generating 's' from integers(min_value=1, max_value=5).filter(functools.partial(eq, Decimal('sNaN')))

Cases where crosshair doesn't find a failing example but Hypothesis does

Seems fine, there are plenty of cases in the other direction. Tracked with @xfail_on_crosshair(Why.undiscovered) in case we want to dig in later.

Nested use of the Hypothesis engine (e.g. given-inside-given)

This is just explicitly unsupported for now. Hypothesis should probably offer some way for backends to declare that they don't support this, and then raise a helpful error message if you try anyway.

pschanely · 2024-07-07T21:03:45Z

@Zac-HD your triage above is SO great. I am investigating.

pschanely · 2024-07-08T17:47:56Z

Knocked out a few of these in 0.0.60.
I think that means current status on my end is:

TypeError: conversion from SymbolicInt to Decimal is not supported
Unsupported operand type(s) for -: 'float' and 'SymbolicFloat' in test_float_clamper
TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object (or 'values' or 'items').
TypeError: _int() got an unexpected keyword argument 'base'
Symbolic not realized (in e.g. test_suppressing_filtering_health_check)
Error in operator.eq(Decimal('sNaN'), an_int)
Zac's cursed example below!

More soon.

Zac-HD · 2024-07-12T07:17:48Z

Ah - the Flaky failures are of course because we had some failure under the Crosshair backend, which did not reproduce under the Hypothesis backend. This is presumably going to point to a range of integration bugs, but is also something that we'll want to clearly explain to users because integration bugs are definitely going to happen in future and users will need to respond (by e.g. using a different backend, ignoring the problem, whatever).

improve the reporting around Flaky failures where the differing or missing errors are related to a change of backend while shrinking. See also Change Flaky to be an ExceptionGroup #4040.
triage all the current failures so we can fix them

tybug · 2024-07-12T15:33:33Z

Most/all of the "expected x, got symbolic" errors are symptoms of an underlying error in my experience (often operation on symbolic while not tracing). In this case running with export HYPOTHESIS_NO_TRACEBACK_TRIM=1 reveals limited_category_index_cache in cm.query is at fault.

Zac-HD · 2024-07-12T18:43:06Z

ah-ha, seems like we might want some #4029 - style 'don't cache on backends with avoid_realize=True' logic.

pschanely · 2024-07-13T00:39:13Z

Still here and excited about this! I am on a detour of doing a real symbolic implementation of the decimal module - should get that out this weekend.

Zac-HD · 2024-07-13T08:18:25Z

Triaging a pile of the Flaky erorrs, most were due to getting a RecursionError under crosshair and then passing under Hypothesis - and it looks like most of those were in turn because of all our nested-@given() test helpers.

So I've tried de-nesting those, which seems to work nicely and even makes things a bit faster by default; and when CI finishes we'll see how much it helps on crosshair 🤞

tybug · 2025-01-13T07:19:40Z

To follow up on the premature realization in ir_size: I'm waiting to complete #3921 (we are very close!) before fixing this. The notion of "size" and overruns is one of the last things to fully hash out on the typed choice sequence, which may change the resolution here.

tybug · 2025-01-21T18:32:26Z

@pschanely OK, after #4247 we should not be prematurely realizing crosshair values anywhere. Please let me know if that's not the case!

The semantics after that pull is that we do not overrun (abort too-large test cases) from avoid_realization backends, as this would require realizing the values to get their size. So CrosshairProvider should be somewhat careful that it's not generating extremely large inputs. The size limit here is pretty generous and I think crosshair already has internal heuristics to this effect, so I don't expect this to be a problem. We've just removed the guardrails for now.

tybug · 2025-01-22T06:39:19Z

Hmm, lots of nondeterministic errors in CI. We're now calling provider.draw_* from multiple distinct stacktraces. Is this something the crosshair provider could ignore? I think we'd like to only make guarantees that the (ir_type, kwargs) of each draw is deterministic across iterations, not necessarily the stacktrace at which it was drawn.

pschanely · 2025-01-26T17:47:56Z

The semantics after that pull is that we do not overrun (abort too-large test cases) from avoid_realization backends, as this would require realizing the values to get their size. So CrosshairProvider should be somewhat careful that it's not generating extremely large inputs. The size limit here is pretty generous and I think crosshair already has internal heuristics to this effect, so I don't expect this to be a problem. We've just removed the guardrails for now.

Sounds good; I expect that CrossHair will time itself out well before this becomes a problem.

Hmm, lots of nondeterministic errors in CI. We're now calling provider.draw_* from multiple distinct stacktraces. Is this something the crosshair provider could ignore? I think we'd like to only make guarantees that the (ir_type, kwargs) of each draw is deterministic across iterations, not necessarily the stacktrace at which it was drawn.

Yes; the stack-based check is pretty useful when debugging nondeterminism, but can be overly conservative in complex cases. The stacks are also used by some crosshair heuristics, and it may impact performance slightly. (e.g. it'll try to figure out how to get to a prior trace and never be able to get there) At any rate, I am not FULLY sure I want to just disable the stack-based check in the long run, but I've disabled it in CrossHair 0.0.82 for now. (there's also some (hopefully net-positive) changes in v 0.0.19 of the hypothesis-crosshair plugin.

At any rate, I am still working on trying to get a full clean run on my side. You might be interested to take a look at these hacks I've made for race-patching random and fresh_data().

tybug · 2025-01-27T06:59:49Z

Nice! I think we can take the random patch more or less as is, and I'll look into the fresh_data (possibly we should skip those tests on crosshair) and zero_data pin (initial reaction is that these choices should already have been realized by the time we try to cache them, so something is going on) fixes. Both look like reasonable local hacks for enabling forward progress 🙂

Zac-HD · 2025-03-02T09:31:39Z

OK, we have very many crosshair.util.CrossHairInternal: Not in a statespace context errors, which I think must have some systematic underlying cause. Not looking further at this time of day, but I'd like to get this merged in the next week or two, so we can show it off at BugBash as a stable feature 😁 (and maybe also integrate into HypoFuzz!)

tybug · 2025-03-03T07:40:06Z

(sorry, my local was out of date and force pushes above were me rebasing and subsequently realizing that had already been done in the up-to-date branch. nothing should have changed)

Zac-HD · 2025-03-03T08:09:28Z

HYPOTHESIS_NO_TRACEBACK_TRIM=1 pytest --hypothesis-profile=crosshair hypothesis-python/tests/cover/test_deadline.py -x points here, which I consider pretty suspicious from a "we seem to have a lifetime error" perspective. I'm not actively chasing this, but let's stick to non-force pushes?

tybug · 2025-03-03T08:21:05Z

Yup, Phillip beat us to it 😅 pschanely@fd6958f

Underlying reason is crosshair raising discard_test_case for the initial zero_data call. We then pin it and check for health checks - but it was discarded, and therefore not realized. Maybe we should just not do any of those things if a backend raises discard_test_case? I'm hesitant to disable this for all alternative backends, but we don't currently have a way to check the BackendCannotProceed status of a data, though we could add that status to the data before freezing it.

This is also in part because we decided to give alternative backends control over NodeTemplate(type="simplest") draws in #4247.

tybug · 2025-03-17T02:05:33Z

I've addressed the zero_pin issue discussed above by tracking the cannot_proceed reason on ConjectureData, and not pinning if the backend could not proceed, assuming that if the test case cannot proceed then the values cannot be realized either.

@pschanely after doing this the CI run shows crosshair returning values of the wrong type from CrosshairPrimitiveProvider.realize on the second test case, after the first test case raises BackendCannotProceed (search hypothesis.errors.HypothesisException: expected <class 'int'>). I wonder if the provider is not removing the choices made during cannot-proceed iterations from the list of values to return while realizing?

Unfortunately this is flaky locally, possibly due to speed-related time outs. HYPOTHESIS_NO_TRACEBACK_TRIM=1 pytest --hypothesis-profile=crosshair -k test_adds_note_showing_which_strategy_stateful should reproduce in theory. Run in cd hypothesis-python, not the root dir.

pschanely · 2025-03-17T02:44:49Z

I've addressed the zero_pin issue discussed above by tracking the cannot_proceed reason on ConjectureData, and not pinning if the backend could not proceed, assuming that if the test case cannot proceed then the values cannot be realized either.

Thank you!

@pschanely after doing this the CI run shows crosshair returning values of the wrong type from CrosshairPrimitiveProvider.realize on the second test case, after the first test case raises BackendCannotProceed (search hypothesis.errors.HypothesisException: expected <class 'int'>). I wonder if the provider is not removing the choices made during cannot-proceed iterations from the list of values to return while realizing?

SGTM; I will investigate tomorrow! One change that we made a while back is possibly relevant here: to avoid false-positives from approximations like real-based floats, we don't actually report the error right away; instead we'll abort the failing pass and try to recreate the pass next time with concrete draws. This can go haywire though if the requested draw types aren't the same on the next pass.

Zac-HD added tests/build/CI about testing or deployment *of* Hypothesis interop how to play nicely with other packages labels Jul 7, 2024

This comment was marked as outdated.

Sign in to view

Zac-HD force-pushed the crosshair-in-ci branch 3 times, most recently from 175b347 to 424943f Compare July 7, 2024 20:26

Zac-HD mentioned this pull request Jul 7, 2024

Stable support for symbolic execution #3914

Open

19 tasks

Zac-HD force-pushed the crosshair-in-ci branch from 424943f to b2d11c7 Compare July 7, 2024 20:56

Zac-HD force-pushed the crosshair-in-ci branch from b2d11c7 to 98ccf44 Compare July 11, 2024 07:23

This comment was marked as outdated.

Sign in to view

Zac-HD force-pushed the crosshair-in-ci branch from 98ccf44 to 4bd7e45 Compare July 12, 2024 07:48

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from 1d2345d to 7bf8983 Compare July 12, 2024 20:15

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from cc07927 to 018ccab Compare July 13, 2024 07:23

This comment was marked as outdated.

Sign in to view

pschanely mentioned this pull request Jul 13, 2024

Auto-realize encode/decode arguments for unsupported codecs pschanely/CrossHair#271

Closed

This was referenced Jul 13, 2024

Make Flaky into an ExceptionGroup #4043

Merged

Test-only improvements #4047

Merged

Zac-HD force-pushed the crosshair-in-ci branch from 5d77e32 to 00a9931 Compare July 13, 2024 21:35

This was referenced Jul 13, 2024

Duplicate type "<class 'array.array'>" registered from repeated imports? pschanely/hypothesis-crosshair#17

Closed

hashlib requires the buffer protocol, which symbolics bytes don't provide pschanely/CrossHair#272

Closed

tybug force-pushed the crosshair-in-ci branch from 765cc55 to 648bbe9 Compare January 22, 2025 06:32

Zac-HD mentioned this pull request Jan 26, 2025

CrossHairInternal: Not in a statespace context error on tests with Pytest fixtures pschanely/hypothesis-crosshair#24

Open

Zac-HD force-pushed the crosshair-in-ci branch from 648bbe9 to 30fe0b8 Compare March 2, 2025 01:20

run crosshair in CI

a314b34

Zac-HD force-pushed the crosshair-in-ci branch from 30fe0b8 to d51ad9f Compare March 2, 2025 09:17

pschanely and others added 4 commits March 2, 2025 15:41

Ensure crosshair gets real monotonic()

78f4f68

skip hanging or very slow tests

c32a5b3

Mark expected failures under crosshair

78191d3

Mark failures for crosshair to fix?

ee122c2

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from 102058b to 262dba3 Compare March 2, 2025 23:53

ugly temp hack

95e734c

Zac-HD force-pushed the crosshair-in-ci branch from 262dba3 to 95e734c Compare March 3, 2025 00:05

tybug force-pushed the crosshair-in-ci branch 2 times, most recently from 0e4d99d to 95e734c Compare March 3, 2025 07:39

skip fresh_data tests on crosshair

4d86313

tybug added 2 commits March 16, 2025 20:04

dont pin datas which cannot proceed

65ff59b

Merge branch 'master' into crosshair-in-ci

403cdb7

skip precise shrinking on crosshair, add nobranch

3f60abd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental] Run crosshair in CI #4034

[experimental] Run crosshair in CI #4034

Zac-HD commented Jul 7, 2024 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

pschanely commented Jul 7, 2024

pschanely commented Jul 8, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

This comment was marked as outdated.

tybug commented Jul 12, 2024

Zac-HD commented Jul 12, 2024 •

edited

Loading

pschanely commented Jul 13, 2024

Zac-HD commented Jul 13, 2024

This comment was marked as outdated.

tybug commented Jan 13, 2025

tybug commented Jan 21, 2025

tybug commented Jan 22, 2025 •

edited

Loading

pschanely commented Jan 26, 2025

tybug commented Jan 27, 2025

Zac-HD commented Mar 2, 2025

tybug commented Mar 3, 2025 •

edited

Loading

Zac-HD commented Mar 3, 2025

tybug commented Mar 3, 2025 •

edited

Loading

tybug commented Mar 17, 2025 •

edited

Loading

pschanely commented Mar 17, 2025

[experimental] Run crosshair in CI #4034

Are you sure you want to change the base?

[experimental] Run crosshair in CI #4034

Conversation

Zac-HD commented Jul 7, 2024 • edited Loading

Hypothesis' problems

Probably Crosshair's problems

Error in operator.eq(Decimal('sNaN'), an_int)

Cases where crosshair doesn't find a failing example but Hypothesis does

Nested use of the Hypothesis engine (e.g. given-inside-given)

This comment was marked as outdated.

This comment was marked as outdated.

pschanely commented Jul 7, 2024

pschanely commented Jul 8, 2024 • edited Loading

Zac-HD commented Jul 12, 2024 • edited Loading

This comment was marked as outdated.

tybug commented Jul 12, 2024

Zac-HD commented Jul 12, 2024 • edited Loading

pschanely commented Jul 13, 2024

Zac-HD commented Jul 13, 2024

This comment was marked as outdated.

tybug commented Jan 13, 2025

tybug commented Jan 21, 2025

tybug commented Jan 22, 2025 • edited Loading

pschanely commented Jan 26, 2025

tybug commented Jan 27, 2025

Zac-HD commented Mar 2, 2025

tybug commented Mar 3, 2025 • edited Loading

Zac-HD commented Mar 3, 2025

tybug commented Mar 3, 2025 • edited Loading

tybug commented Mar 17, 2025 • edited Loading

pschanely commented Mar 17, 2025

Zac-HD commented Jul 7, 2024 •

edited

Loading

Error in `operator.eq(Decimal('sNaN'), an_int)`

pschanely commented Jul 8, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

tybug commented Jan 22, 2025 •

edited

Loading

tybug commented Mar 3, 2025 •

edited

Loading

tybug commented Mar 3, 2025 •

edited

Loading

tybug commented Mar 17, 2025 •

edited

Loading