-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An abstraction for the realization of a GP #364
Comments
This is a good point -- I agree that the statement that you highlight is not correct. Before proceeding, I think it's worth noting that you can incrementally update a posterior (I believe we've implemented this in a reasonably performant manner, but if you think there's room for improvement, your assistance would be greatly appreciated!), so your example that uses the current interface can be simplified somewhat: x = zeros(n)
f = GP(kernel)
grads = [rand(f(x))]
while true
grad = rand(f(x))
f = posterior(f(x), grad)
x += step_size * grad
if norm(grads[end]) < eps
break
end
end My overall thoughts on this kind of thing is that it's very cool, but I'm not convinced that the use-case you point out provides an incredibly strong example of where it's useful, because there's a good clean solution with the current interface (albeit I agree that the solution that you propose is a little bit cleaner). @rossviljoen do you have thoughts on this? In terms of the interface, it feels like it addresses identical concerns to what the pathwise sampling stuff does, which I know you were looking at for a bit. @theogf @st-- do you have thoughts on this kind of thing? I'm broadly keen to avoid extending the interface further, because it's already quite substantial, but I could definitely be convinced. I think potentially |
I think So, perhaps it's best to just implement I do think this particular implementation would be nice to have for |
There might be scope for encapsulating something like what Will posted within an interface similar to |
(The "call" API appears like it ought to be stateful, and one downside of the Cholesky approach is that this is not the case - each subsequent call becomes more expensive - and it feels like this should be an explicit |
The The fact that the cholesky approach makes every subsequent call more expensive is unfortunate. But I personally don't think this is a good argument against the callable approach. This problem exists no matter what, there is no reason to burden yourself with an interface to constantly remind yourself of it.
I only have a passing familiarity with methods other than the cholesky decomposition for the simulation of random functions as they all appeared to be unsuitable for high dimensional functions which I am interested in. But the gist I got from them is, that they generally try to simulate the entire But even if there are different ways to implement this, having different ways to handle something can be overwhelming and a good package should provide reasonable defaults in my opinion. Similar to the way For example: rf = RandomFunction(kernel, backend=Exact.Cholesky())
rf = RandomFunction(mean, kernel, backend=Approximations.SpectralDecomposition())
|
So I had a look at function update_chol(chol::Cholesky, C12::AbstractMatrix, C22::AbstractMatrix)
U12 = chol.U' \ C12
U22 = cholesky(Symmetric(C22 - U12'U12)).U
return Cholesky([chol.U U12; zero(U12') U22], 'U', 0)
end This functions avoids the recalculation of U11, so its algorithmic complexity is perfect, but it moves a ton of memory around in [chol.U U12; zero(U12') U22] Let us consider u00 u10
u01 u11 i.e. it is stored column wise i.e. its memory is u00 u10 a00
0 u11 a01
0 0 U22 which in memory is stored as You can solve this issue by using the PackedStorage format. I.e. you store the triangular matrix u00 u10 u20
0 u11 u21
0 0 u22 as An advantage you get for free is, that the packed storage format takes half the memory to store a triangular matrix. The disadvantage you get is, that BLAS3 routines do not work. This is because BLAS3 operates on block matrices. This problem can be fixed, if the entries of our packed storage format would already be block matrices. While I have implemented the former, I haven't finished an implementation of packed block matrices yet. |
Thanks for looking into this. I believe that the current implementation is an artifact of needing the code to work nicely in conjunction with Zygote, but I don't think I've looked at it since it was first implemented so I might be misremembering. |
@willtebbutt I mean this is a reasonable implementation if you do not expect to incrementally evaluate a random function step by step. And to make this step by step evaluation performant you need a non-standard memory layout. So it isn't bad code. It just doesn't work nicely with incremental evaluation. I also didn't do benchmarks so far - this is only a theoretical consideration. I was interested in Bayesian Optimization on high dimensional random functions. And this resulted in a form of gradient descent but with calculable step sizes. To test those out I needed to be able to simulate high dimensional, differentiable random functions and felt like it was easier to write a package myself than try and fit this non-standard requirement into existing packages. Now I try to circle back and see what I actually need and how to merge this into existing packages. The non-standard memory layout should maybe be its own package - it is essentially a different implementation of an abstract array which works nicely with incremental cholesky decompositions |
I have been looking into this "lazy" evaluation of a GP before, I think the first time I have seen it was in this paper. For me, this falls under "sampling" a GP, maybe perhaps be seen as an alternative of decoupled pathwise sampling or just naively sampling on a grid and interpolating, all of which of some pros and cons in different scenarios. I think it would make sense to have a some common interface for all these methods to get realizations from a |
I don't think that this belongs into |
I could see the proposed method based on incremental cholesky here, and then approximate methods for sampling the GP (incremental choleksy but with finite cyclic buffer, decoupled sampling, grid sampling,...). I think an interface like |
So in the documentation there is the sentence:
When I first encountered
AbstractGPs.jl
I thought that the abstractions you define were simply born out of a different opinion. But reading this sentence I think it this might actually be a fruitful discussion. Because I disagree. It is absolutely possible to implementrf = rand(f)
.The way you do it is lazily. I mean you could also not implement
f(x) = 2x
if that meant you had to calculatef
for every possiblex
right from the start. The way you are in fact able to implementf
anyway is, by implmementing the process to get fromx
tof(x)
and only apply this process once you actually want to knowf(x)
for a particularx
.Now in my particular case (gradient descent on a random function), I needed to incrementally evaluate this abstract random function
rf = rand(f)
. If I wanted to do this withAbstractGPs.jl
it would look something like this (I am going to try and ignore the fact that AbstractGPs is also not intended for the simulation of gradients and pretend thatrf
represents the gradient vector field for simplicity):This is anything but natural. Now in comparison consider this:
The above is essentially what I implemented in https://github.com/FelixBenning/IncrementalGRF.jl. Only that I skip the
rand(GP(kernel))
and essentially treatGP(kernel)
as the realization of a random function immediatly. Well in my case it is calledGaussianRandomFunction(kernel)
but that is besides the point. It essentially works like this:Whenever
rf(x)
is called, the valuerf(x)
is simulated based on a stored vector of previous evaluations to whichx
is added at the end. To ensure this is compuationally efficient, I also store the running cholesky decomposition (which can easily be extened by another row). Since the cholesky decomposition does the same incremental extension anyway, calculating it incrementally turns out to be no more expensive complexity wise. The only implementation issue is to allow for an extendible matrix. Since we are interested in a triangular matrix though, we can do that using an extensible vector by simply appending rows resulting in a packed storage format. Unfortunately this format is only supported by BLAS2 routines and not BLAS3, so at the moment I am loosing a little bit of performance (but half the memory requirements). I want to fix this issue with packed block matrices in the future.But anyway: What do you think about this way to implement random functions? If you wanted to get the conditional expectation instead of new evaluations, you could easily
freeze
such a random function e.g.oh yeah and you also get the broadcast operator for free:
rf.(evaluatioin_points)
totally makes sense if you want to evaluaterf
at multiple points. What do you think about this interface? Are there use cases where this interface is more clunky than the current one ofAbstractGPs
?The text was updated successfully, but these errors were encountered: