-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define equality for Sampleables #1122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think defining ==
makes sense, and that this is the easiest and lightest weight way to do it :) But it'd be good if a more regular maintainer could take a look too (ping @matbesancon to see if he can recommend someone)
Codecov Report
@@ Coverage Diff @@
## master #1122 +/- ##
=======================================
Coverage 79.91% 79.92%
=======================================
Files 114 114
Lines 5851 5868 +17
=======================================
+ Hits 4676 4690 +14
- Misses 1175 1178 +3
Continue to review full report at Codecov.
|
@matbesancon is there anyone in particular I should ask to review this? |
… On Wed, Jun 3, 2020, 21:42 Sam Morrison ***@***.***> wrote:
@matbesancon <https://github.com/matbesancon> is there anyone in
particular I should ask to review this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1122 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB2FDMWGTRV4MHELV2SKELTRU2YUNANCNFSM4NQYX3UQ>
.
|
Are there any drawbacks of going this route? |
One I can see is that it adds yet another layer to the code. Also
precompilation is already super long for the package.
Is there a reason we need this? Hashing is rarely used for distributions
…On Thu, Jun 4, 2020, 11:39 Moritz Schauer ***@***.***> wrote:
Are there any drawbacks of going this route?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1122 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB2FDMT6R4ED5QGULKF4GCLRU5TTRANCNFSM4NQYX3UQ>
.
|
the current behaviour in #1117 seems odd to me i.e. distributions are sometime equal to one another and i think Distributions.jl should support that |
I think this makes sense, as evidenced by Regarding the increase computation time, could you measure it? Then we know. One question I have: AutoHashEquals uses In that case I should also change my code perhaps. |
What's the best way to measure compile time? Here's what i tried: This branch: $ git checkout invenia/sm/equality
Previous HEAD position was feae3ba9 Patch bump
HEAD is now at 4e646d0d Adds compat and note
$ julia --project=@. -q
(Distributions) pkg> up
Updating registry at `~/.julia/registries/General`
Updating git-repo `[email protected]:JuliaRegistries/General.git`
Updating registry at `~/.julia/registries/Invenia`
Updating git-repo `[email protected]:invenia/PackageRegistry.git`
Updating `~/repos/Distributions.jl/Project.toml`
[15f4f7f2] + AutoHashEquals v0.2.0
[90014a1f] ↑ PDMats v0.9.12 ⇒ v0.10.0
Updating `~/repos/Distributions.jl/Manifest.toml`
[7d9fca2a] - Arpack v0.4.0
[68821587] - Arpack_jll v3.5.0+3
[15f4f7f2] + AutoHashEquals v0.2.0
[4536629a] - OpenBLAS_jll v0.3.9+4
[90014a1f] ↑ PDMats v0.9.12 ⇒ v0.10.0
julia> @timed using Distributions
[ Info: Precompiling Distributions [31c24e10-a181-5473-b8eb-7969acd0382f]
(value = nothing, time = 5.987775708, bytes = 164154461, gctime = 0.03380193, gcstats = Base.GC_Diff(164154461, 32, 0, 2748228, 522, 0, 33801930, 3, 0)) Master: $ git checkout origin/master
Previous HEAD position was 4e646d0d Adds compat and note
HEAD is now at feae3ba9 Patch bump
$ julia --project=@. -q
(Distributions) pkg> up
Updating registry at `~/.julia/registries/General`
Updating git-repo `[email protected]:JuliaRegistries/General.git`
Updating registry at `~/.julia/registries/Invenia`
Updating git-repo `[email protected]:invenia/PackageRegistry.git`
Updating `~/repos/Distributions.jl/Project.toml`
[90014a1f] ↓ PDMats v0.10.0 ⇒ v0.9.12
Updating `~/repos/Distributions.jl/Manifest.toml`
[7d9fca2a] + Arpack v0.4.0
[68821587] + Arpack_jll v3.5.0+3
[15f4f7f2] - AutoHashEquals v0.2.0
[4536629a] + OpenBLAS_jll v0.3.9+4
[90014a1f] ↓ PDMats v0.10.0 ⇒ v0.9.12
julia> @timed using Distributions
[ Info: Precompiling Distributions [31c24e10-a181-5473-b8eb-7969acd0382f]
(value = nothing, time = 6.000390004, bytes = 171419321, gctime = 0.027515303, gcstats = Base.GC_Diff(171419321, 38, 0, 2832029, 536, 0, 27515303, 3, 0)) Which shows ~6 seconds for both (slightly less on this branch for some reason) julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a (2020-05-28 17:42 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake) |
I was just about to say the same thing. Mine results were ~7 seconds with this branch also being slightly faster. |
My use case is just to add some tests to one of my packages checking that a couple of functions produce equivalent results. |
Good question! I think i agree with this issue JuliaServices/AutoHashEquals.jl#18, that |
In concur that |
Is that suggestion for this PR or for AutoHashEquals? Should we move forward with this method for now? |
Do we actually need hashing, or can we just define something like Base.:(==)(d1::T, d2::T) where {T<:Distribution} = all(i -> getfield(d1, i) == getfield(d2, i), 1:fieldcount(T))
Base.:(==)(d1::Distribution, d2::Distribution) = false ? That is, recursively compare all internal fields of two distributions of the same type and declare distributions of different types to be unequal. |
I don't know... I'd have guessed it's acceptable to define It seems having So if we don't need/want |
I'd personally prefer that over taking on another dependency if nobody needs |
I would also favour the solution with only == and no external dependency |
Cool :) I guess we just want to change Shall we add |
yes for both points |
The downside to the changed code is that now |
Hmm... How many distributions have a similar issue? If it's just MvNormal we can special case it? |
Actually, that would indeed affect any distribution with a type parameter. |
So, we could do something like this: function Base.:(==)(d1::Distribution, d2::Distribution)
nameof(typeof(d1)) === nameof(typeof(d2)) || return false
names = fieldnames(typeof(d1))
names == fieldnames(typeof(d2)) || return false
for name in names
(isdefined(d1, name) && isdefined(d2, name)) || return false
getfield(d1, name) == getfield(d2, name) || return false
end
return true
end I kind of hate it though. Particularly the EDIT: Should be |
If AutoHashEquals had the desired behaviour (via fixing JuliaServices/AutoHashEquals.jl#18) would that be preferable? Or would it still be preferred to define |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes, @morris25!
I think we got to a better place? Are others happy with this?
I've a style comment + a question about isapprox
below, but i wouldn't hold up this PR over them :)
(tbh i think the biggest change in this PR is dropping PDMat 0.9 for 0.10, but i think that's a good change anyway; 0.10 is much better)
Edit: oh, removing the method using (also update title of PR 😊)promote_eltype
before merge seems important
Added |
LGTM, unless there are any outstanding objections, I'll play to merge this on Monday. |
My reading of https://docs.julialang.org/en/v1/base/math/#Base.:== implies that it's better to define hash if you define One simple solution is to define hash to always return the same value: that's inefficient but correct. |
I believe we already discussed this, the docs state it's optional, and the decision was not to implement |
What kind of hash computes |
The fallback |
That's not so nice, because this way |
On the other hand, exactly the same logic that defines |
Sorry for the delay. I was off last week. Hashing has been added |
All good now @nalimilan? |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Closes #1117.