Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive localization #168

Merged
merged 29 commits into from
Nov 28, 2023
Merged

Adaptive localization #168

merged 29 commits into from
Nov 28, 2023

Conversation

tommyod
Copy link
Collaborator

@tommyod tommyod commented Nov 2, 2023

Playing around trying to come up with an API for adaptive localization.

  • A new class BaseESMDA handles perturb_observations and storing covariance and observations - this is common to both:
    • The class ESMDA, which inherits from BaseESMDA, complementing it with assimilate.
    • The class AdaptiveESMDA, which also inherits from BaseESMDA, complementing it with its own assimilate method that also handles adaptive localization.

Closes #167

@tommyod
Copy link
Collaborator Author

tommyod commented Nov 3, 2023

A first draft is ready:

# SETUP ESMDA FOR LOCALIZATION AND SOLVE PROBLEM

Notes

  • I kind of the like the subclassing, but I think there might be three classes in the end: BaseESMDA, ESMDA(BaseESMDA) and AdaptiveESMDA(BaseESMDA). Common functionality goes in the BaseESMDA class. Still not certain about this.
  • Fedas idea idea of finding the rows matching the parameter group might be a premature optimization. Consider finding equal elements in a list [1, 2, 3, 3, 2, 2, 1]. If we iterate over and the first entry is 1, we have to examine the entire list again to find indices of the other 1s. So we go from $\mathcal{O}(n)$ to $\mathcal{O}(n^2)$ complexity, effectively having two for-loops instead of one. A better idea might be to sort, or look up in a hash-table, or something like that. For now, I have not implemented this idea. In the end timing will decide how we do this I guess :)

@dafeda
Copy link
Collaborator

dafeda commented Nov 3, 2023

The idea of rows matching the parameter group may not be great, but it was definitely not premature optimization as it made it possible to run adaptive localization on Troll in something resembling a reasonable amount of time.
But still, I do hope we can avoid using it.

@dafeda
Copy link
Collaborator

dafeda commented Nov 3, 2023

Overall I think this looks like a really nice and clean approach.

@dafeda
Copy link
Collaborator

dafeda commented Nov 3, 2023

I've also tried running it a bit with varying number of parameters and it seems to run amazingly fast!
Exciting! 🙀

Comment on lines 122 to 125
cov_XY[corr_XY < thres] = 0 # Set small values to zero
# print(" Entries in cov_XY set to zero", np.isclose(cov_XY, 0).sum())

return X + cov_XY @ transition_matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works.
Perhaps intuitively for others, because sample covariance among points does not include other variables.
Not intuitively for me, as I know that we may decompose our objective to cov_x @ LLS_coeff @ np.linalng.inv(cov_d). The LLS_coeff should and does indeed change when removing variables. But this is captured by sample covariance cov_x in the above, so the effect is removed.
So the above works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that I do not like this (mathematical) property.
But I do like the algorithmic exploitation of it in the code. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works.

Here referring to that we do not have to recompute cov_XY after variable removal / set to zero.

@tommyod
Copy link
Collaborator Author

tommyod commented Nov 14, 2023

I could use some help debugging.

I created a linear regression problem, a simple linear model.

As expected, a Ridge finds coefficients close to the true parameters:
image

Running standard ESMDA also produces sensible results when num_ensemble is large.
image

Running Adaptive ESMDA also produces sensible results when the threshold for the correlation matrix is small. (It also matches exactly when the correlation threshold is 0 -- I have written a test for this, as well as the 1 case).
image

However, when the threshold is between 0 and 1, I get completely nonsensical results. Here with 0.5:
image

Here are a few cases, trying to figure out how it relates to sizes:

# Base case - produces problems
num_parameters = 25
num_observations = 50
num_ensemble = 100

# Large number of ensemble members - still problematic
num_parameters = 25
num_observations = 50
num_ensemble = 100_000

# Increase both observations and ensemble members - extreme problems
num_parameters = 25
num_observations = 500
num_ensemble = 10_000

@Blunde1
Copy link
Contributor

Blunde1 commented Nov 15, 2023

However, when the threshold is between 0 and 1, I get completely nonsensical results. Here with 0.5: image

The problem seems to worsen at increased number of ES-MDA iterations, or at least depend heavily upon it. Here is a plot running with only iteration (alpha = normalize_alpha(np.ones(1))) should correspond to ES with adaptive localization. The results are still poor, so it should be possible to debug this (simpler) case.

image

@Blunde1
Copy link
Contributor

Blunde1 commented Nov 15, 2023

@kvashchuka, @dafeda and me had a whiteboard session (final result after some iterations is included).

The hypothesis was to inspect the "transition matrix" in the code, let's call it $T$, which should be an estimator of $\Sigma_d^{-1}(d_i-y_i)$, note that we have multiple forms of the observation covariance $\Sigma_d\approx (H\hat{\Sigma_x}H^T + \Sigma_{\epsilon}) \approx (YY^T/(n-1) + \Sigma_{\epsilon}) $, where the latter is employed in ES.

What we think the problem, is that $T$ is calculated once with the full sets of observations and responses. For global $T$ to be the "correct" transition matrix for a parameter $i$, say $T=T_i$, we would require our estimate of $\Sigma_d$ to be diagonal. This is never true for $T$ using the sample covariance $YY^T/(n-1)$ internally, making the estimate $\hat{\Sigma_d}$ dense, and generally therefore $T\neq T_i$.

The original implementation of adaptive localization implicitly found a $T_i$ for each parameter $i$, by subsetting the $Y$'s and $D$'s which were deemed "significant" to affect the posterior update of parameter $i$.
This was computationally expensive, because finding $T_i$ is computationally expensive through inversion, and we have a great deal of them due to the many parameters.

The batching algorithm sought subsets of $i$'s for which $T_i$ should be the same. The search is non-random because consecutive parameters in $X$ are "likely" close in space if they correspond to some spatial parameters in a discretized field. This search is done before inversion and finding $T_{subset}$.

In statistical terms, the marginal covariance is indeed the subsetted covariance directly, however this does not hold generally true for the inverse (precision) which we are working with. It holds true if the precision is diagonal, or if it is block-diagonal and we are subsetting the blocks. This should be equivalent to finding block $T_{subset}$'s. Perhaps we could use the $\Sigma_{x,y}$ and cross-correlations/thresholding to be even smarter about this search?

image

@tommyod
Copy link
Collaborator Author

tommyod commented Nov 15, 2023

Not sure I following everything. Let's have a chat! In the meantime, some notes:

  • You first use $i$ as a reference to the transition matrix and the observation vector $y_i$, then say "transition matrix for a parameter $i$", but the transition matrix is independent of the parameters. Is $i$ a parameter or observation index?
  • "we would require our estimate of $\Sigma_{d}^{-1}$ to be diagonal" I don't understand this.

The update equation for all parameters is:

$X^{\text{updated}} = X + XY^T / (n-1) \Sigma_{d}^{-1} (d - y) = X + \text{cov}(X, Y)T$

For a single parameter $i$, a row vector $x_i$ in $X$, the update equation becomes:

$x_i^{\text{updated}} = x_i + x_iY^T / (n-1) \Sigma_{d}^{-1} (d - y) = x_i + \text{cov}(x_i, Y)T$

So $T$ is completely independent of which $i$ we're updating. So it seems to me that $T$ is always the correct matrix, no matter what $i$ we're updating.


The original implementation sliced the observations. Assume that parameter $i$ has significant correlatrion to two out of three outputs, e.g. mask = [0, 1, 1]. Then the update was:

x_i_updated = x_i + cov(x_i, Y[m, :]) inv(cov(Y[m, :], Y[m, :]) + sigma) @ (d -Y)[m, :]

I agree that this is something different, since inv(cov(Y[m, :], Y[m, :]) != inv(cov(Y, Y)[m, m]

@dafeda
Copy link
Collaborator

dafeda commented Nov 15, 2023

  • You first use i as a reference to the transition matrix and the observation vector yi, then say "transition matrix for a parameter i", but the transition matrix is independent of the parameters. Is i a parameter or observation index?

The transition matrix depends on the parameters via the Y = g(X).

x_i_updated = x_i + cov(x_i, Y[m, :]) inv(cov(Y[m, :], Y[m, :]) + sigma) @ (d -Y)[m, :]

I agree that this is something different, since inv(cov(Y[m, :], Y[m, :]) != inv(cov(Y, Y)[m, :]

Indeed 👍

@tommyod
Copy link
Collaborator Author

tommyod commented Nov 15, 2023

I implemented a version with the masking. Results look much much better:
image

@Blunde1
Copy link
Contributor

Blunde1 commented Nov 21, 2023

One of the great things about the KF is that it runs on $O(m)$ time for updating instead of $O(m^3)$ when we have $m$ measurements. This is possible when the measurement error is independent, and consequently the covariance of the error is diagonal. See Understanding the EnKF Section 4.2 Serial Updating. https://folk.ntnu.no/joeid/Katzfuss.pdf it refers to this paper that I believe is the origin paper for covariance localization https://www.researchgate.net/publication/228817876_A_Sequential_Ensemble_Kalman_Filter_for_Atmospheric_Data_Assimilation

Our measurement error is so independent so that we have even built it into the api I believe. At least for current practical purposes, $\Sigma_\epsilon$ will be diagonal. Then why can we too not do sequential updates? There probably is a good reason and I am not seeing it.

Thus throwing something wild out there: how about iterating over observations assimilating them sequentially, and then iterate over parameters when doing this sequential update?

for (dj, yj, Sigma_eps_j) in (observations, responses, variance):
    H = LLS coefficients yj on X[mask,:] # 1xp but using X which could be p2xn for p2<p depending on mask
    Sigma_yj = H @X @ X.T @ H.T # 1x1
    Sigma_d = Sigma_yj + Sigma_eps_j # 1x1
    for i in realizations
        T_ji = (dj - yji) / Sigma_d # 1x1
        for X_ki in realization_i_of_parameters:
            if dj not masked for xk:
                X_ki = X_ki + cov_xk_yj*T_ji

Is this easily wrong? Or wrong for some other reason? I tried to adjust for the weird reasons that using the full transition matrix was wrong (writing H = LLS coefficients yj on X[mask:,]). It therefore assumes the masking is pre-computed. Is this unreasonable?

Edit: If you do not have immediate objections (either theoretically or computationally) I could try implementing it for the Guass-Linear notebook example and see if results makes sense.

@tommyod tommyod requested review from dafeda and kvashchuka November 24, 2023 12:30
tests/test_esmda.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
tests/test_experimental.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dafeda dafeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great 👍

@tommyod tommyod merged commit 32f436b into equinor:main Nov 28, 2023
9 checks passed
@tommyod tommyod deleted the adaptive-localization branch November 28, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Implement adaptive localization
4 participants