Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards a full fledged powerful ABC package #5

Open
6 tasks
jbrea opened this issue Jun 15, 2020 · 16 comments
Open
6 tasks

Towards a full fledged powerful ABC package #5

jbrea opened this issue Jun 15, 2020 · 16 comments

Comments

@jbrea
Copy link
Collaborator

jbrea commented Jun 15, 2020

Introduction

There are currently a few different and unrelated packages for Approximate Bayesian Computation and Likelihood-Free Inference in julia. As mentioned on discourse, it may be nice to coordinate ABC efforts in julia a bit; at least I would enjoy this, 😄. In the following I try to give a brief overview of the current state. I spent limited time on reviewing the packages. Apologies if I missed something and please correct all my mistakes! After that, I make a few propositions.

Current State

ApproximateBayesianComputing.jl @eford

Methods

  • ABC-PMC, Beaumont et al. 2002

API

model(params) = ...
setup = method_plan(model, compute_summary_statistics, metric, prior; kwargs...)
result = run_abc(setup, data; kwargs...)

Features

  • Additional Distributions:
    • GaussianMixtureModelCommonCovar
    • GaussianMixtureModelCommonCovarTruncated
    • GaussianMixtureModelCommonCovarSubset
    • GaussianMixtureModelCommonCovarDiagonal
    • MultiUniform
    • LinearTransformedBeta
    • GenericCompositeContinuousDist
  • parallel evaluation (?)
  • Gaussian Processes emulation

GPABC.jl @tanhevg

Methods

  • Rejection ABC
  • ABC-SMC (Toni et al. 2009)
  • Emulated ABC-Rejection
  • Emulated ABC-SMC
  • ABC model selection (Toni et al. 2010)

API

model(params) = ...
result = method(data, model, prior, args...; kwargs...)

Features

  • Plotting recipes
  • Stochastic Linear Inference (LNA)
  • Custom GP implementation (see e.g. GaussianProcesses.jl for an alternative)
  • Utilities to compute summary statistics

ApproxBayes.jl @marcjwilliams1

Methods

  • Rejection ABC
  • ABC-SMC (Toni et al. 2009)
  • ABC model selection (Toni et al. 2010)

API

model(params, constants, targetdata) = ...
setup = method(model, args..., prior)
result = runabc(setup, data; kwargs...)

Features

  • Plotting recipes
  • Composite Prior
  • Custom distance ksdist
  • multi-threading

KissABC.jl @francescoalemanno

Methods

  • Rejection ABC
  • ABC-SMC (Drovandi et al. 2011)
  • ABC-DE (Turner and Sederberg 2012)
  • Kernelized ABC-DE

API

model(params, constants) = ...
setup = ABCPlan(prior, model, data, metric)
result = method(setup, kwargs...)

Features

  • Factored Distribution
  • parallel evaluation (multi-threading)

LikelihoodfreeInference.jl (myself)

Methods

  • PMC-ABC (Beaumont et al. 2002)
  • Adaptive SMC (Del Moral et al. 2012)
  • K2-ABC (Park et al. 2016)
  • Kernel ABC (Fukumizu et al. 2013)
  • Approximative Maximum A Posterior Estimation
    • Kernel Recursive ABC (Kajihara et al. 2018)
    • Point estimators inspired by Bertl et al. 2017 (Kernel), Jiang et al. 2018 (KL-Divergence), Briol et al. 2019 (Maximum Discrepancy Distance), Székely and Rizzo (Energy Distance)

API

model(params) = ...
setup = method(prior = ..., kwargs...)
result = run!(setup, model, data; kwargs...)

Features

  • Additional Distributions
    • MultivariateUniform
    • TruncatedMultivariateNormal
  • extensions of corrplot and histogram

Propositions

There is a little bit of overlap between the packages, but overall they seem fairly complementary. However, from a user perspective I think it would be awesome, if there would be a common API, such that one can easily switch between the different packages. I imagine in particular one way to define priors, models, metrics and fitting.

ABCBase.jl: a common API and some basic utilities

My proposition here is to write together a very light-weight ABCBase.jl package that serves as a primary dependency of ABC packages. See for example DiffEqBase.jl or ReinforcementLearningBase.jl for how this is done in other eco-systems. I would include in ABCBase.jl

Ingredients

  • everything related to prior distributions
  • everything related to summary statistics
  • everything related to metrics
  • testing (and possibly assertion) utilities
  • a well-written documentation of the common API

API

My proposition for the API is the following (I am biased of course, and I am very open to discussion!)

Additional to everything related to priors, summarys stats and metrics,
ABCBase.jl exports a function fit! with the following signature

fit!(setup, model, data; verbosity = 0, callback = () -> nothing, rng = Random.GLOBAL_RNG)

Every ABC package that relies on ABCBase.jl extends this fit! function, e.g.

ABCBase.fit!(method::RejectionABC, model, data; kwargs...) = "blabla"

The user provides models as callable objects (functions or functors) with one argument.
Constants are recommended to be handled with closures.
Extraction of summary statistics is done in the model.
For example

model(params) = "do something with params"

my_complex_model(params, constants) = "do something with params and constants"
model(params) = let constants = "blabla" my_complex_model(params, constants) end

my_raw_model(params) = "returns some raw data"
model(params) = extract_summary_statistics(my_raw_model(params))

struct MyFunctorModel
    options
end
(m::MyFunctorModel)(params) = "do something with m and params"

ABC methods/plans/setups are specified in the form

setup = method(metric = my_metric, kwargs...)
setup = method(prior = my_prior, kwargs...) # if method has a prior

One master packages to access all methods

Similar in spirit to DifferentialEquations.jl we could create one package that aggregates all packages and gives unified access. The dependency graph would be something like

            ABCBase.jl
                |
     -----------------------
    |           |          |
 ABCPkg1     ABCPkg2      etc.
    |           |          |
    ------------------------
                |
              ABC.jl

This package does nothing but reexport all the setups/methods defined in the
different packages and the fit! function. The name of this package should of course be discussed.

ABCProblems.jl

I think it would be nice to have a package with typical ABC benchmark problems,
like the stochastic lotka-volterra problem, the blowfly problem etc. Maybe we
could collect them in a package ABCProblems.jl.

New methods to be implemented

Here is an incomplete list of methods that I would love to see implemented in
julia. Together with a collection of benchmark problems one would get a nice box
to benchmark new methods we do research on.

Conclusions and Questions

Who would be up for such a collaborative effort?
How do you like my proposition for ABCBase.jl? What would you change?
Shall we create ABCBase.jl, ABCProblems.jl and ABC.jl? Or something similar with different names?

@francescoalemanno
Copy link

I'm also in favour of unifying efforts, a first step can be whipping up ABCBase.jl and opening an organization to eventually contain the whole family of packages (if it can be done for free)

@jbrea
Copy link
Collaborator Author

jbrea commented Jun 15, 2020

That's a good idea. Yes, there are free organizations. Shall I create one with name ABCJulia or JuliaApproxBayes? (JuliaABC is taken, unfortunately)

@francescoalemanno
Copy link

JuliaApproxBayes sounds nice, what about JuliaApproxInference

@jbrea
Copy link
Collaborator Author

jbrea commented Jun 15, 2020

Done 😄 And since I liked the name I just added the (currently empty) ApproxInferenceBase.jl package.

@williams1
Copy link

williams1 commented Jun 16, 2020 via email

@eford
Copy link

eford commented Jun 17, 2020 via email

@jbrea
Copy link
Collaborator Author

jbrea commented Jun 17, 2020

Thanks @eford for your feedback!

I think it is a very good idea to move as much as possible to Distributions and Distance. Let's open some PR's there! (see also here).

I hesitated when suggesting fit! originally, for the same reasons you brought up. In my package I currently use run!. Would you be fine with run! instead of fit!?

@fipelle
Copy link

fipelle commented Oct 18, 2022

Not sure how active this thread is, but I have started working with similar methods and I am trying to implement ABC via Turing. Of course, it lacks a series of tools available in the packages mentioned above - which is where the space for those packages may be - but it is quite handy and maintained. Are you perhaps considering looking into that too?

@eford
Copy link

eford commented Oct 18, 2022

Distance

Thanks @eford for your feedback!

I think it is a very good idea to move as much as possible to Distributions and Distance. Let's open some PR's there! (see also here).

I hesitated when suggesting fit! originally, for the same reasons you brought up. In my package I currently use run!. Would you be fine with run! instead of fit!?

Yes, I think it makes sense to run! an ABC simulation. (Sorry for not noticing this for 2 years!)

@eford
Copy link

eford commented Oct 18, 2022

Not sure how active this thread is, but I have started working with similar methods and I am trying to implement ABC via Turing. Of course, it lacks a series of tools available in the packages mentioned above - which is where the space for those packages may be - but it is quite handy and maintained. Are you perhaps considering looking into that too?

I very much appreciate Turing's PPL and easy integration with MCMC samplers.
I agree that it "should" be practical to reusing Turing's PPL within an ABC context.
But I'm curious what your motivation is. Is it mostly for making easy comparisons or for pedagogical purposes? Or something else?
I would have thought that the main benefit of ABC would be in contexts where we have a detailed forward model, but it's not practical to express that in a PPL.

@jbrea
Copy link
Collaborator Author

jbrea commented Oct 19, 2022

@fipelle thanks for reviving this thread.
My plans were actually not to use Turings PPL but rather its "backend" AbstractMCMC.jl, in a similar way as it is done in KissABC. I haven't found the time yet to continue on this, but I really would love to have a good ecosystem that allows us to compare the different approximate inference methods based on MCMC, Kernels, Optimal Transport, etc., both for point estimation and posterior estimation.

@fipelle
Copy link

fipelle commented Oct 19, 2022

Replies

But I'm curious what your motivation is. Is it mostly for making easy comparisons or for pedagogical purposes? Or something else?

@eford I would love to have access to the samplers and - more broadly - its backend. They now have a SMC implementation too which may be a nice starting point for more advanced ABC applications. In my case, it is cheeper to evaluate the summary statistics than the likelihood due to data related issues.

My plans were actually not to use Turings PPL but rather its "backend" AbstractMCMC.jl, in a similar way as it is done in KissABC

@jbrea this is also a good idea. For now, I am using Turing PPL directly and approaching the problem by defining the faux distribution:

using Distributions;

struct UnknownContinuousDistribution <: ContinuousUnivariateDistribution
    summary_statistics_value::Real
end

# Julia cannot sample from an unknown distribution
Distributions.rand(rng::AbstractRNG, d::UnknownContinuousDistribution) = nothing;

# While the pdf is also unknown, a good summary statistics should be able to proxy it to some extent - the latter is computed externally within a @model macro, and stored in `d`
Distributions.logpdf(d::UnknownContinuousDistribution, x::Real) = d.summary_statistics_value;

I then define the summary statistics within a Turing model and let my data

y ~ UnknownContinuousDistribution(summary_statistics_value)

where summary_statistics_value is to be maximised. Of course, this works only when conditioning on the data since you cannot sample pseudo-random numbers from an unknown distribution. I am not sure though how to use Turing for situations in which online learning is key as highlighted here. Ideally, it would be nice to implement something like this.

Development

If you like this approach one way forward could be polishing the faux distribution, create a discrete equivalent and write up a series of shortcuts to simplify online learning. The packages in this environment could have specialised approaches to deal with online learning, proposals and ways for computing summary statistics - perhaps something similar to what @jbrea mentioned on top such as ABC.jl, ABCBase.jl, ABCSummaryStatistics.jl and ABCSequential.jl. In Python there is also ABCpy which is quite broad and may have functions worth implementing.

@jbrea
Copy link
Collaborator Author

jbrea commented Oct 20, 2022

@fipelle Do you have already some code publicly available where you use this approach? It would be interesting to see it "in action".

Please let me know, if you want to move a package to this org or you want to discuss API questions.

@fipelle
Copy link

fipelle commented Oct 20, 2022

@jbrea I will release a small package in the next few days with a few working examples, so that you can see it in action

@fipelle
Copy link

fipelle commented Oct 21, 2022

I forgot to mention that I would love discussing APIs especially for SMC. I have seen different implementations on Julia and I am not sure which one is the most up-to-date (wrt to Julia). I will take a look at KissABC to see how they have implemented AbstractMCMC.jl. It would be great if you could write / direct me to a compact example of SMC usage - either through Turing.jl, AbstractMCMC.jl or AdvancedPS.jl - with online learning in mind.

@jbrea
Copy link
Collaborator Author

jbrea commented Oct 26, 2022

@fipelle sorry for the delay. Unfortunately I don't know a compact example of SMC usage with online learning in mind.
However, did you already have a look at this discussion? It's a bit outdated but I think the key ideas still apply today and may be relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants