Scaling before updating #220

dafeda · 2024-09-02T13:02:41Z

Problem Formulation

During the update process, inverting the autocovariance matrix of responses is required. However, if the responses are on vastly different scales, the autocovariance matrix can become ill-conditioned, leading to the following warning: LinAlgWarning: Ill-conditioned matrix. This issue has been reported in several real FMU use-cases.

To build some intuition, here's a draft of a test that uses synthetic data to force this warning:

Link to the test

In this test, there are only two responses, with one response on a very different scale than the other.

@Oddvar Lia mentioned large differences in this post: Slack discussion.

Possible Solutions

1. Use Standard Scaling

One suggestion by @Blunde1 is to use a standard scaler from sklearn or similar to do the following:

- X_prior_std, Y_prior_std = Scaler_X(X_prior), Scaler_Y(Y_prior)
- Apply the Scaler_Y also to d
- X_posterior_std = update(X_prior_std, Y_prior_std, d)
- X_posterior = Scaler_X.inverseTransform(X_posterior_std)

"I think this should all be okay, as the transformations are affine and the update is for (an approximate) Gaussian. But should probably be double checked. I hope it should be theoretically equivalent, but numerically better conditioned."

One concern is whether standard scaling is appropriate for responses such as water-breakthrough, which may be mostly zero but can then become large.

Note from Glison:

"Yes! I think I understood Feda's question and I agree with Berent's suggestion. I think it is similar to what we have used in our implementations during the PhD and in Petrobras.

Feda Curic, if we use the user-defined standard deviation of the observation error to standardize the data (and everything that comes from the data) in the computations, we should not have issues related to sim data that are mostly zero and have a breakthrough. The user should define a minimum tolerance for all observations anyway and zero is not valid.

Defining tolerances for water production is actually a common doubt among engineers when building FMU setup for HM. I usually recommend that they set something like MAX(MINTOL, 0.1*dobs), where MINTOL is a reasonable value for the problem. MINTOL = 1% is a common choice when WCUT is the type of data.

I believe that this approach is very similar to the paper that Vinicius Rios has cited. There, Emerick suggests to use the diagonal of CD to standardize."

2. Look into the Implementation Used by Emerick

Vinicius suggested reviewing the following paper for a possible solution:

Emerick, A.A., 2016. Analysis of the performance of ensemble-based assimilation of production and seismic data. Journal of Petroleum Science and Engineering 139, 219–239.
DOI: 10.1016/j.petrol.2016.01.029

Next Steps

Identify an approach that performs well in general and is also effective on specific data FMU produces, such as water-breakthrough.

The text was updated successfully, but these errors were encountered:

dafeda added the needs-disussion label Sep 2, 2024

dafeda added this to SCOUT Sep 3, 2024

eivindjahren added christmas-review Issues and PRs for Christmas review and removed christmas-review Issues and PRs for Christmas review labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling before updating #220

Scaling before updating #220

dafeda commented Sep 2, 2024 •

edited

Loading

Scaling before updating #220

Scaling before updating #220

Comments

dafeda commented Sep 2, 2024 • edited Loading

Problem Formulation

Possible Solutions

1. Use Standard Scaling

2. Look into the Implementation Used by Emerick

Next Steps

dafeda commented Sep 2, 2024 •

edited

Loading