-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[experimental] Run crosshair in CI #4034
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
175b347
to
424943f
Compare
@Zac-HD your triage above is SO great. I am investigating. |
Knocked out a few of these in 0.0.60.
More soon. |
Ah - the
|
This comment was marked as outdated.
This comment was marked as outdated.
Most/all of the "expected x, got symbolic" errors are symptoms of an underlying error in my experience (often operation on symbolic while not tracing). In this case running with |
ah-ha, seems like we might want some #4029 - style 'don't cache on backends with avoid_realize=True' logic. |
1d2345d
to
7bf8983
Compare
Still here and excited about this! I am on a detour of doing a real symbolic implementation of the |
cc07927
to
018ccab
Compare
Triaging a pile of the So I've tried de-nesting those, which seems to work nicely and even makes things a bit faster by default; and when CI finishes we'll see how much it helps on crosshair 🤞 |
This comment was marked as outdated.
This comment was marked as outdated.
To follow up on the premature realization in |
@pschanely OK, after #4247 we should not be prematurely realizing crosshair values anywhere. Please let me know if that's not the case! The semantics after that pull is that we do not overrun (abort too-large test cases) from |
765cc55
to
648bbe9
Compare
Hmm, lots of nondeterministic errors in CI. We're now calling |
Sounds good; I expect that CrossHair will time itself out well before this becomes a problem.
Yes; the stack-based check is pretty useful when debugging nondeterminism, but can be overly conservative in complex cases. The stacks are also used by some crosshair heuristics, and it may impact performance slightly. (e.g. it'll try to figure out how to get to a prior trace and never be able to get there) At any rate, I am not FULLY sure I want to just disable the stack-based check in the long run, but I've disabled it in CrossHair 0.0.82 for now. (there's also some (hopefully net-positive) changes in v 0.0.19 of the hypothesis-crosshair plugin. At any rate, I am still working on trying to get a full clean run on my side. You might be interested to take a look at these hacks I've made for race-patching random and fresh_data(). |
Nice! I think we can take the random patch more or less as is, and I'll look into the fresh_data (possibly we should skip those tests on crosshair) and zero_data pin (initial reaction is that these choices should already have been realized by the time we try to cache them, so something is going on) fixes. Both look like reasonable local hacks for enabling forward progress 🙂 |
OK, we have very many |
102058b
to
262dba3
Compare
0e4d99d
to
95e734c
Compare
(sorry, my local was out of date and force pushes above were me rebasing and subsequently realizing that had already been done in the up-to-date branch. nothing should have changed) |
|
Yup, Phillip beat us to it 😅 pschanely@fd6958f Underlying reason is crosshair raising This is also in part because we decided to give alternative backends control over |
I've addressed the zero_pin issue discussed above by tracking the @pschanely after doing this the CI run shows crosshair returning values of the wrong type from Unfortunately this is flaky locally, possibly due to speed-related time outs. |
Thank you!
SGTM; I will investigate tomorrow! One change that we made a while back is possibly relevant here: to avoid false-positives from approximations like real-based floats, we don't actually report the error right away; instead we'll abort the failing pass and try to recreate the pass next time with concrete draws. This can go haywire though if the requested draw types aren't the same on the next pass. |
See #3914
To reproduce this locally, you can run
make check-crosshair-cover/nocover/niche
for the same command as in CI, but I'd recommendpytest --hypothesis-profile=crosshair hypothesis-python/tests/{cover,nocover,datetime} -m xf_crosshair --runxfail
to select and run only the xfailed tests.Hypothesis' problems
Flaky: Inconsistent results from replaying a failing test...
- mostly backend-specific failures; we've both"hypothesis/internal/conjecture/data.py", line 2277, in draw_boolean
assert p > 2 ** (-64)
, fixed in1f845e0
(#4049)@given
, fixed in 3315be6target()
, fixed in85712ad
(#4049)typing_extensions
when crosshair depends on it@xfail_on_crosshair(...)
..too_slow
and.filter_too_much
, and skip remaining affected tests under crosshair.-k 'not decimal'
once we're closerPathTimeout
; see RarePathTimeout
errors inprovider.realize(...)
pschanely/hypothesis-crosshair#21 and Stable support for symbolic execution #3914 (comment)Add
BackendCannotProceed
to improve integration #4092Probably Crosshair's problems
Duplicate type "<class 'array.array'>" registered
from repeated imports? pschanely/hypothesis-crosshair#17RecursionError
, seeRecursionError
in_issubclass
pschanely/CrossHair#294unsupported operand type(s) for -: 'float' and 'SymbolicFloat'
intest_float_clamper
TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object
(or'values'
or'items'
). Fixed in Implement various fixes for hypothesis integration pschanely/CrossHair#269TypeError: _int() got an unexpected keyword argument 'base'
hashlib
requires the buffer protocol, which symbolics bytes don't provide pschanely/CrossHair#272typing.get_type_hints()
raisesValueError
, seetyping.get_type_hints()
raisesValueError
when used inside Crosshair pschanely/CrossHair#275TypeError
in bytes regex, seeTypeError
in bytes regex pschanely/CrossHair#276provider.draw_boolean()
insideFeatureStrategy
, see Invalid combination of arguments todraw_boolean(...)
pschanely/hypothesis-crosshair#18dict(name=value)
, see Support nameddict
init syntax pschanely/CrossHair#279PurePath
constructor, seePurePath(LazyIntSymbolicStr)
error pschanely/CrossHair#280zlib.compress()
not symbolic, see a bytes-like object is required, notSymbolicBytes
when callingzlib.compress(b'')
pschanely/CrossHair#286int.from_bytes(map(...), ...)
, see Acceptmap()
object - or any iterable - inint.from_bytes()
pschanely/CrossHair#291base64.b64encode()
and friends pschanely/CrossHair#293TypeError: conversion from SymbolicInt to Decimal is not supported
; see also snan belowTypeVar
problem, seez3.z3types.Z3Exception: b'parser error'
from interaction withTypeVar
pschanely/CrossHair#292RecursionError
inside Lark, see Weird failures using sets pschanely/CrossHair#297Error in
operator.eq(Decimal('sNaN'), an_int)
Cases where crosshair doesn't find a failing example but Hypothesis does
Seems fine, there are plenty of cases in the other direction. Tracked with
@xfail_on_crosshair(Why.undiscovered)
in case we want to dig in later.Nested use of the Hypothesis engine (e.g. given-inside-given)
This is just explicitly unsupported for now. Hypothesis should probably offer some way for backends to declare that they don't support this, and then raise a helpful error message if you try anyway.