Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define equality for Sampleables #1122

Merged
merged 9 commits into from
Jun 24, 2020
Merged

Define equality for Sampleables #1122

merged 9 commits into from
Jun 24, 2020

Conversation

morris25
Copy link
Contributor

@morris25 morris25 commented Jun 2, 2020

Closes #1117.

Copy link
Contributor

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think defining == makes sense, and that this is the easiest and lightest weight way to do it :) But it'd be good if a more regular maintainer could take a look too (ping @matbesancon to see if he can recommend someone)

Project.toml Outdated Show resolved Hide resolved
Project.toml Show resolved Hide resolved
src/edgeworth.jl Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2020

Codecov Report

Merging #1122 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1122   +/-   ##
=======================================
  Coverage   79.91%   79.92%           
=======================================
  Files         114      114           
  Lines        5851     5868   +17     
=======================================
+ Hits         4676     4690   +14     
- Misses       1175     1178    +3     
Impacted Files Coverage Δ
src/common.jl 66.66% <100.00%> (+16.66%) ⬆️
src/multivariate/mvnormal.jl 70.94% <100.00%> (-1.21%) ⬇️
src/truncated/normal.jl 84.61% <0.00%> (+0.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update feae3ba...75f3638. Read the comment docs.

@morris25
Copy link
Contributor Author

morris25 commented Jun 3, 2020

@matbesancon is there anyone in particular I should ask to review this?

@matbesancon
Copy link
Member

matbesancon commented Jun 3, 2020 via email

@mschauer
Copy link
Member

mschauer commented Jun 4, 2020

Are there any drawbacks of going this route?

@matbesancon
Copy link
Member

matbesancon commented Jun 4, 2020 via email

@nickrobinson251
Copy link
Contributor

Is there a reason we need this?

the current behaviour in #1117 seems odd to me i.e. distributions are sometime equal to one another and i think Distributions.jl should support that

@mschauer
Copy link
Member

mschauer commented Jun 4, 2020

I think this makes sense, as evidenced by
https://github.com/mschauer/GaussianDistributions.jl/blob/6f3ac9c329c1e832c414f940bf6c8da91e34a16a/src/GaussianDistributions.jl#L68

Regarding the increase computation time, could you measure it? Then we know.

One question I have: AutoHashEquals uses isequal and not ==. Is this the right thing? E.g. should NaN-means of Normals compare unequal?

In that case I should also change my code perhaps.

@nickrobinson251
Copy link
Contributor

nickrobinson251 commented Jun 4, 2020

Regarding the increase computation time, could you measure it?

What's the best way to measure compile time? Here's what i tried:

This branch:

$ git checkout invenia/sm/equality
Previous HEAD position was feae3ba9 Patch bump
HEAD is now at 4e646d0d Adds compat and note

$ julia --project=@. -q
(Distributions) pkg> up
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `[email protected]:JuliaRegistries/General.git`
   Updating registry at `~/.julia/registries/Invenia`
   Updating git-repo `[email protected]:invenia/PackageRegistry.git`
Updating `~/repos/Distributions.jl/Project.toml`
  [15f4f7f2] + AutoHashEquals v0.2.0
  [90014a1f]  PDMats v0.9.12  v0.10.0
Updating `~/repos/Distributions.jl/Manifest.toml`
  [7d9fca2a] - Arpack v0.4.0
  [68821587] - Arpack_jll v3.5.0+3
  [15f4f7f2] + AutoHashEquals v0.2.0
  [4536629a] - OpenBLAS_jll v0.3.9+4
  [90014a1f]  PDMats v0.9.12  v0.10.0

julia> @timed using Distributions
[ Info: Precompiling Distributions [31c24e10-a181-5473-b8eb-7969acd0382f]
(value = nothing, time = 5.987775708, bytes = 164154461, gctime = 0.03380193, gcstats = Base.GC_Diff(164154461, 32, 0, 2748228, 522, 0, 33801930, 3, 0))

Master:

$ git checkout origin/master
Previous HEAD position was 4e646d0d Adds compat and note
HEAD is now at feae3ba9 Patch bump

$ julia --project=@. -q
(Distributions) pkg> up
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `[email protected]:JuliaRegistries/General.git`
   Updating registry at `~/.julia/registries/Invenia`
   Updating git-repo `[email protected]:invenia/PackageRegistry.git`
Updating `~/repos/Distributions.jl/Project.toml`
  [90014a1f]  PDMats v0.10.0  v0.9.12
Updating `~/repos/Distributions.jl/Manifest.toml`
  [7d9fca2a] + Arpack v0.4.0
  [68821587] + Arpack_jll v3.5.0+3
  [15f4f7f2] - AutoHashEquals v0.2.0
  [4536629a] + OpenBLAS_jll v0.3.9+4
  [90014a1f]  PDMats v0.10.0  v0.9.12

julia> @timed using Distributions
[ Info: Precompiling Distributions [31c24e10-a181-5473-b8eb-7969acd0382f]
(value = nothing, time = 6.000390004, bytes = 171419321, gctime = 0.027515303, gcstats = Base.GC_Diff(171419321, 38, 0, 2832029, 536, 0, 27515303, 3, 0))

Which shows ~6 seconds for both (slightly less on this branch for some reason)


julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a (2020-05-28 17:42 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

@morris25
Copy link
Contributor Author

morris25 commented Jun 4, 2020

I was just about to say the same thing. Mine results were ~7 seconds with this branch also being slightly faster.

@morris25
Copy link
Contributor Author

morris25 commented Jun 4, 2020

My use case is just to add some tests to one of my packages checking that a couple of functions produce equivalent results.

@nickrobinson251
Copy link
Contributor

One question I have: AutoHashEquals uses isequal and not ==. Is this the right thing? E.g. should NaN-means of Normals compare unequal?

Good question! I think i agree with this issue JuliaServices/AutoHashEquals.jl#18, that == should use == recursively rather than isequals. If that were the case then we'd be able to use == to have NaN-means of Normals be != but still have them be isequal, which i think is the behaviour we'd want?

@mschauer
Copy link
Member

mschauer commented Jun 4, 2020

In concur that == should recursively call ==.

@morris25
Copy link
Contributor Author

morris25 commented Jun 4, 2020

In concur that == should recursively call ==.

Is that suggestion for this PR or for AutoHashEquals? Should we move forward with this method for now?

@ararslan
Copy link
Member

ararslan commented Jun 4, 2020

Do we actually need hashing, or can we just define something like

Base.:(==)(d1::T, d2::T) where {T<:Distribution} = all(i -> getfield(d1, i) == getfield(d2, i), 1:fieldcount(T))
Base.:(==)(d1::Distribution, d2::Distribution) = false

? That is, recursively compare all internal fields of two distributions of the same type and declare distributions of different types to be unequal.

@nickrobinson251
Copy link
Contributor

I don't know... I'd have guessed it's acceptable to define == without hash.

It seems having hash defined usually implies having == defined too but having == defined does not imply hash is also defined

So if we don't need/want hash (which neither Sam nor I need for our uses), then I think we could implement this as Alex suggests if folks prefer?

@ararslan
Copy link
Member

ararslan commented Jun 4, 2020

I'd personally prefer that over taking on another dependency if nobody needs hash.

@matbesancon
Copy link
Member

I would also favour the solution with only == and no external dependency

@nickrobinson251
Copy link
Contributor

Cool :) I guess we just want to change Distribution to Sampleable in the code Alex suggests above?

Shall we add isequal too (calling isequal on fields)?

@matbesancon
Copy link
Member

yes for both points

@morris25
Copy link
Contributor Author

morris25 commented Jun 4, 2020

The downside to the changed code is that now MvNormal(map(Float64,ones(2))) != MvNormal(map(Float32,ones(2))).

@nickrobinson251
Copy link
Contributor

Hmm... How many distributions have a similar issue? If it's just MvNormal we can special case it?

@ararslan
Copy link
Member

ararslan commented Jun 4, 2020

Actually, that would indeed affect any distribution with a type parameter.

@ararslan
Copy link
Member

ararslan commented Jun 4, 2020

So, we could do something like this:

function Base.:(==)(d1::Distribution, d2::Distribution)
    nameof(typeof(d1)) === nameof(typeof(d2)) || return false
    names = fieldnames(typeof(d1))
    names == fieldnames(typeof(d2)) || return false
    for name in names
        (isdefined(d1, name) && isdefined(d2, name)) || return false
        getfield(d1, name) == getfield(d2, name) || return false
    end
    return true
end

I kind of hate it though. Particularly the nameof check, though without it, you'd be able to equate Normal and LogNormal, for example.

EDIT: Should be nameof(typeof()), not just nameof()

@nickrobinson251
Copy link
Contributor

If AutoHashEquals had the desired behaviour (via fixing JuliaServices/AutoHashEquals.jl#18) would that be preferable? Or would it still be preferred to define == and isequal directly (via something like Alex's latest suggestion above - thanks for that btw!)?

Copy link
Contributor

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes, @morris25!

I think we got to a better place? Are others happy with this?

I've a style comment + a question about isapprox below, but i wouldn't hold up this PR over them :)
(tbh i think the biggest change in this PR is dropping PDMat 0.9 for 0.10, but i think that's a good change anyway; 0.10 is much better)

Edit: oh, removing the method using promote_eltype before merge seems important (also update title of PR 😊)

src/multivariate/mvnormal.jl Show resolved Hide resolved
src/common.jl Show resolved Hide resolved
src/common.jl Outdated Show resolved Hide resolved
@morris25 morris25 changed the title Uses AutoHashEquals to define equality for Distributions Define equality for Sampleables Jun 5, 2020
@morris25
Copy link
Contributor Author

morris25 commented Jun 5, 2020

Added isapprox and did the suggested cleanup. Is this good to merge now?

@rofinn
Copy link
Member

rofinn commented Jun 5, 2020

LGTM, unless there are any outstanding objections, I'll play to merge this on Monday.

@nalimilan
Copy link
Member

My reading of https://docs.julialang.org/en/v1/base/math/#Base.:== implies that it's better to define hash if you define ==. Otherwise the type is going to behave incorrectly in dicts. But that docstring is weird since it's hard to know in advance whether a type will be used as a dict key. Indeed https://docs.julialang.org/en/v1/base/base/#Base.hash says that "isequal(x,y) implies hash(x)==hash(y)", which isn't the case in the current state of this PR.

One simple solution is to define hash to always return the same value: that's inefficient but correct.

src/multivariate/mvtdist.jl Outdated Show resolved Hide resolved
src/common.jl Outdated Show resolved Hide resolved
@nickrobinson251
Copy link
Contributor

I believe we already discussed this, the docs state it's optional, and the decision was not to implement hash until someone had a use-case for it.

@mschauer
Copy link
Member

mschauer commented Jun 6, 2020

What kind of hash computes hash currently for Distributions?

@nalimilan
Copy link
Member

The fallback hash method uses objectid so it's unique to each instance.

@mschauer
Copy link
Member

mschauer commented Jun 6, 2020

That's not so nice, because this way isequal and hash do not match and there is no easy way to opt out.

@mschauer
Copy link
Member

mschauer commented Jun 7, 2020

On the other hand, exactly the same logic that defines isequal can define the hash, so let's just do that.

@morris25
Copy link
Contributor Author

Sorry for the delay. I was off last week. Hashing has been added

@morris25
Copy link
Contributor Author

All good now @nalimilan?

src/common.jl Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should distributions have == defined?
8 participants