Adaptive localization #168

tommyod · 2023-11-02T13:55:55Z

Playing around trying to come up with an API for adaptive localization.

A new class BaseESMDA handles perturb_observations and storing covariance and observations - this is common to both:
- The class ESMDA, which inherits from BaseESMDA, complementing it with assimilate.
- The class AdaptiveESMDA, which also inherits from BaseESMDA, complementing it with its own assimilate method that also handles adaptive localization.

Closes #167

tommyod · 2023-11-03T13:30:05Z

A first draft is ready:

iterative_ensemble_smoother/src/iterative_ensemble_smoother/experimental.py

Line 139 in 6a91f4c

# SETUP ESMDA FOR LOCALIZATION AND SOLVE PROBLEM

Notes

I kind of the like the subclassing, but I think there might be three classes in the end: BaseESMDA, ESMDA(BaseESMDA) and AdaptiveESMDA(BaseESMDA). Common functionality goes in the BaseESMDA class. Still not certain about this.
Fedas idea idea of finding the rows matching the parameter group might be a premature optimization. Consider finding equal elements in a list [1, 2, 3, 3, 2, 2, 1]. If we iterate over and the first entry is 1, we have to examine the entire list again to find indices of the other 1s. So we go from $\mathcal{O}(n)$ to $\mathcal{O}(n^2)$ complexity, effectively having two for-loops instead of one. A better idea might be to sort, or look up in a hash-table, or something like that. For now, I have not implemented this idea. In the end timing will decide how we do this I guess :)

dafeda · 2023-11-03T15:05:54Z

The idea of rows matching the parameter group may not be great, but it was definitely not premature optimization as it made it possible to run adaptive localization on Troll in something resembling a reasonable amount of time.
But still, I do hope we can avoid using it.

src/iterative_ensemble_smoother/experimental.py

dafeda · 2023-11-03T15:29:53Z

Overall I think this looks like a really nice and clean approach.

dafeda · 2023-11-03T15:57:52Z

I've also tried running it a bit with varying number of parameters and it seems to run amazingly fast!
Exciting! 🙀

Blunde1 · 2023-11-10T13:43:25Z

src/iterative_ensemble_smoother/experimental.py

+        cov_XY[corr_XY < thres] = 0  # Set small values to zero
+        # print("  Entries in cov_XY set to zero", np.isclose(cov_XY, 0).sum())
+
+        return X + cov_XY @ transition_matrix


This works.
Perhaps intuitively for others, because sample covariance among points does not include other variables.
Not intuitively for me, as I know that we may decompose our objective to cov_x @ LLS_coeff @ np.linalng.inv(cov_d). The LLS_coeff should and does indeed change when removing variables. But this is captured by sample covariance cov_x in the above, so the effect is removed.
So the above works.

I must admit that I do not like this (mathematical) property.
But I do like the algorithmic exploitation of it in the code. 👍

This works.

Here referring to that we do not have to recompute cov_XY after variable removal / set to zero.

tommyod · 2023-11-14T12:34:26Z

I could use some help debugging.

I created a linear regression problem, a simple linear model.

As expected, a Ridge finds coefficients close to the true parameters:

Running standard ESMDA also produces sensible results when num_ensemble is large.

Running Adaptive ESMDA also produces sensible results when the threshold for the correlation matrix is small. (It also matches exactly when the correlation threshold is 0 -- I have written a test for this, as well as the 1 case).

However, when the threshold is between 0 and 1, I get completely nonsensical results. Here with 0.5:

Here are a few cases, trying to figure out how it relates to sizes:

# Base case - produces problems
num_parameters = 25
num_observations = 50
num_ensemble = 100

# Large number of ensemble members - still problematic
num_parameters = 25
num_observations = 50
num_ensemble = 100_000

# Increase both observations and ensemble members - extreme problems
num_parameters = 25
num_observations = 500
num_ensemble = 10_000

Blunde1 · 2023-11-15T08:13:49Z

However, when the threshold is between 0 and 1, I get completely nonsensical results. Here with 0.5:

The problem seems to worsen at increased number of ES-MDA iterations, or at least depend heavily upon it. Here is a plot running with only iteration (alpha = normalize_alpha(np.ones(1))) should correspond to ES with adaptive localization. The results are still poor, so it should be possible to debug this (simpler) case.

Blunde1 · 2023-11-15T09:47:21Z

@kvashchuka, @dafeda and me had a whiteboard session (final result after some iterations is included).

The hypothesis was to inspect the "transition matrix" in the code, let's call it $T$, which should be an estimator of $\Sigma_d^{-1}(d_i-y_i)$, note that we have multiple forms of the observation covariance $\Sigma_d\approx (H\hat{\Sigma_x}H^T + \Sigma_{\epsilon}) \approx (YY^T/(n-1) + \Sigma_{\epsilon}) $, where the latter is employed in ES.

What we think the problem, is that $T$ is calculated once with the full sets of observations and responses. For global $T$ to be the "correct" transition matrix for a parameter $i$, say $T=T_i$, we would require our estimate of $\Sigma_d$ to be diagonal. This is never true for $T$ using the sample covariance $YY^T/(n-1)$ internally, making the estimate $\hat{\Sigma_d}$ dense, and generally therefore $T\neq T_i$.

The original implementation of adaptive localization implicitly found a $T_i$ for each parameter $i$, by subsetting the $Y$'s and $D$'s which were deemed "significant" to affect the posterior update of parameter $i$.
This was computationally expensive, because finding $T_i$ is computationally expensive through inversion, and we have a great deal of them due to the many parameters.

The batching algorithm sought subsets of $i$'s for which $T_i$ should be the same. The search is non-random because consecutive parameters in $X$ are "likely" close in space if they correspond to some spatial parameters in a discretized field. This search is done before inversion and finding $T_{subset}$.

In statistical terms, the marginal covariance is indeed the subsetted covariance directly, however this does not hold generally true for the inverse (precision) which we are working with. It holds true if the precision is diagonal, or if it is block-diagonal and we are subsetting the blocks. This should be equivalent to finding block $T_{subset}$'s. Perhaps we could use the $\Sigma_{x,y}$ and cross-correlations/thresholding to be even smarter about this search?

tommyod · 2023-11-15T11:05:03Z

Not sure I following everything. Let's have a chat! In the meantime, some notes:

You first use $i$ as a reference to the transition matrix and the observation vector $y_i$, then say "transition matrix for a parameter $i$", but the transition matrix is independent of the parameters. Is $i$ a parameter or observation index?
"we would require our estimate of $\Sigma_{d}^{-1}$ to be diagonal" I don't understand this.

The update equation for all parameters is:

$X^{\text{updated}} = X + XY^T / (n-1) \Sigma_{d}^{-1} (d - y) = X + \text{cov}(X, Y)T$

For a single parameter $i$, a row vector $x_i$ in $X$, the update equation becomes:

$x_i^{\text{updated}} = x_i + x_iY^T / (n-1) \Sigma_{d}^{-1} (d - y) = x_i + \text{cov}(x_i, Y)T$

So $T$ is completely independent of which $i$ we're updating. So it seems to me that $T$ is always the correct matrix, no matter what $i$ we're updating.

The original implementation sliced the observations. Assume that parameter $i$ has significant correlatrion to two out of three outputs, e.g. mask = [0, 1, 1]. Then the update was:

x_i_updated = x_i + cov(x_i, Y[m, :]) inv(cov(Y[m, :], Y[m, :]) + sigma) @ (d -Y)[m, :]

I agree that this is something different, since inv(cov(Y[m, :], Y[m, :]) != inv(cov(Y, Y)[m, m]

dafeda · 2023-11-15T11:25:19Z

You first use i as a reference to the transition matrix and the observation vector yi, then say "transition matrix for a parameter i", but the transition matrix is independent of the parameters. Is i a parameter or observation index?

The transition matrix depends on the parameters via the Y = g(X).

x_i_updated = x_i + cov(x_i, Y[m, :]) inv(cov(Y[m, :], Y[m, :]) + sigma) @ (d -Y)[m, :]
I agree that this is something different, since inv(cov(Y[m, :], Y[m, :]) != inv(cov(Y, Y)[m, :]

Indeed 👍

tommyod · 2023-11-15T11:34:27Z

I implemented a version with the masking. Results look much much better:

Blunde1 · 2023-11-21T06:33:08Z

One of the great things about the KF is that it runs on $O(m)$ time for updating instead of $O(m^3)$ when we have $m$ measurements. This is possible when the measurement error is independent, and consequently the covariance of the error is diagonal. See Understanding the EnKF Section 4.2 Serial Updating. https://folk.ntnu.no/joeid/Katzfuss.pdf it refers to this paper that I believe is the origin paper for covariance localization https://www.researchgate.net/publication/228817876_A_Sequential_Ensemble_Kalman_Filter_for_Atmospheric_Data_Assimilation

Our measurement error is so independent so that we have even built it into the api I believe. At least for current practical purposes, $\Sigma_\epsilon$ will be diagonal. Then why can we too not do sequential updates? There probably is a good reason and I am not seeing it.

Thus throwing something wild out there: how about iterating over observations assimilating them sequentially, and then iterate over parameters when doing this sequential update?

for (dj, yj, Sigma_eps_j) in (observations, responses, variance):
    H = LLS coefficients yj on X[mask,:] # 1xp but using X which could be p2xn for p2<p depending on mask
    Sigma_yj = H @X @ X.T @ H.T # 1x1
    Sigma_d = Sigma_yj + Sigma_eps_j # 1x1
    for i in realizations
        T_ji = (dj - yji) / Sigma_d # 1x1
        for X_ki in realization_i_of_parameters:
            if dj not masked for xk:
                X_ki = X_ki + cov_xk_yj*T_ji

Is this easily wrong? Or wrong for some other reason? I tried to adjust for the weird reasons that using the full transition matrix was wrong (writing H = LLS coefficients yj on X[mask:,]). It therefore assumes the masking is pre-computed. Is this unreasonable?

Edit: If you do not have immediate objections (either theoretically or computationally) I could try implementing it for the Guass-Linear notebook example and see if results makes sense.

tests/test_esmda.py

src/iterative_ensemble_smoother/esmda.py

src/iterative_ensemble_smoother/experimental.py

tests/test_experimental.py

src/iterative_ensemble_smoother/esmda.py

Co-authored-by: Feda Curic <[email protected]>

…erative_ensemble_smoother into adaptive-localization

dafeda

This is great 👍

Tommy Odland added 6 commits November 2, 2023 14:26

started playing with localization

f075ddf

two simple approaches

256862e

some functions

097f8f8

subclassed ESMDA

1fc4c85

full implementation with dying realizations

abe8623

docs

6a91f4c

tommyod mentioned this pull request Nov 3, 2023

Speed up adaptive localization by computing matrices once in IES equinor/ert#6478

Closed

3 tasks

dafeda reviewed Nov 3, 2023

View reviewed changes

src/iterative_ensemble_smoother/experimental.py Show resolved Hide resolved

generalized to more parameter groups

5473b04

dafeda assigned tommyod Nov 10, 2023

Blunde1 reviewed Nov 10, 2023

View reviewed changes

Tommy Odland added 2 commits November 14, 2023 09:57

base case for ESMDA. tests for adaptive localization

41ba5f9

added notebook for debugging

0b66fe5

cleanup

5c89e7d

Tommy Odland added 5 commits November 15, 2023 12:36

added masking to notebook after discussion with team

4224c85

started refactoring

c30d88f

added groupby_indices function

4b8eef0

updated code with masking of Y. added tests

1b0a31a

update notebook with new API

d544242

tommyod requested review from dafeda and kvashchuka November 24, 2023 12:30

dafeda reviewed Nov 24, 2023

View reviewed changes

tests/test_esmda.py Outdated Show resolved Hide resolved

dafeda reviewed Nov 24, 2023

View reviewed changes

src/iterative_ensemble_smoother/esmda.py Outdated Show resolved Hide resolved

dafeda reviewed Nov 24, 2023

View reviewed changes

src/iterative_ensemble_smoother/esmda.py Outdated Show resolved Hide resolved

dafeda reviewed Nov 24, 2023

View reviewed changes

src/iterative_ensemble_smoother/esmda.py Outdated Show resolved Hide resolved

dafeda reviewed Nov 24, 2023

View reviewed changes

src/iterative_ensemble_smoother/experimental.py Outdated Show resolved Hide resolved

review comments. compute cov(Y, Y) once, then index on it

e20349b