Correct PCA component counting logic #11134

dafeda · 2025-06-17T07:20:54Z

Results from running ES on Drogon using old and new way of counting principal components:

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure unit tests pass locally after every commit (git rebase -i main --exec 'just rapid-tests')

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Add backport label to latest release (format: 'backport release-branch-name')

codspeed-hq · 2025-06-17T07:45:53Z

CodSpeed Performance Report

Merging #11134 will not alter performance

_{Comparing dafeda:test-misfit-corr-groups (9358a5f) with main (c668568)}

Summary

✅ 22 untouched benchmarks

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py

larsevj · 2025-06-17T08:18:25Z

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py

+
+    # Create Group A: `nr_obs_group_a` perfectly correlated
+    # responses based on `params_a`
+    for i in range(nr_obs_group_a):


What if these are "perfectly negatively correlated" instead? I suspect one will not necessarily get 2 groups then, but what is reasonable in that case?

I remember you saying something about this, but don't recall the details.
Why do you suspect different results with negative correlations?

If you have observation a,b,c with correlations:

$$ \begin{bmatrix} 1 & 0.9 & -0.9 \\ 0.9 & 1 & 0.1 \\ -0.9 & 0.1 & 1 \end{bmatrix} $$

you would with our method get distances ab: 1.01, ac: 2.8 and bc: 2.2.
Whereas if you where to consider positive and negative correlations as equal (take absolute value) you would get:
ab: 0.812, ac: 0.812 and bc: 1.27.
Now whether we want to consider positive and negative correlations as equal is up for question i guess?

Ah I see. Nice example.

Here's a trivial example that shows that perfectly negatively correlated responses are treated as being far apart, while perfectly positively correlated responses are treated as being close.

corr_matrix = np.array([ [ 1.0, -1.0], [-1.0, 1.0] ]) Z = linkage(corr_matrix, "average", "euclidean") print("Distance perfect negative correlation: ", Z[0, 2]) corr_matrix = np.array([ [ 1.0, 1.0], [1.0, 1.0] ]) Z = linkage(corr_matrix, "average", "euclidean") print("Distance perfect positive correlation: ", Z[0, 2])

Distance perfect negative correlation: 2.8284271247461903 Distance perfect positive correlation: 0.0

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py

xjules · 2025-06-23T13:33:41Z

src/ert/analysis/misfit_preprocessor.py

@@ -45,7 +54,10 @@ def get_nr_primary_components(
    # sum to get the cumulative proportion of variance explained by each successive
    # component.
    variance_ratio = np.cumsum(singulars**2) / np.sum(singulars**2)
-    return max(len([1 for i in variance_ratio[:-1] if i < threshold]), 1)
+
+    num_components = np.searchsorted(variance_ratio, threshold, side="left") + 1


tests/ert/unit_tests/analysis/test_misfit_preprocessor.py

xjules · 2025-06-30T12:24:01Z

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py

+    # Create Group B: (b+j)*(-1)^j*X_2 pattern - checkerboard correlation
+    b = 1  # base scaling factor for group B
+    for j in range(nr_obs_group_b):
+        sign = (-1) ** j  # alternates: +1, -1, +1, -1, ...


maybe more efficient: sign = -1 if j % 2 else 1

The modulo version is approx 1 second faster with 10 million groups. The number of groups in this test is maximum 6.

import time n = 10_000_000 # Test exponentiation start = time.perf_counter() for j in range(n): sign = (-1) ** j time_exp = time.perf_counter() - start # Test modulo start = time.perf_counter() for j in range(n): sign = -1 if j % 2 else 1 time_mod = time.perf_counter() - start print(f"Exponentiation: {time_exp:.4f}s") print(f"Modulo: {time_mod:.4f}s") print(f"Winner: {'Exponentiation' if time_exp < time_mod else 'Modulo'}")

Exponentiation: 1.5748s Modulo: 0.6303s Winner: Modulo

xjules · 2025-07-01T08:12:15Z

These tests timeout:

FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess0]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess2]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess3]@snake_oil_case_storage - AssertionError: value does not match the expected value in snapshot

BREAKING CHANGE: The change in component counting directly affects the number of clusters requested in the `main` auto-scaling function. For the same input data, this version may produce a different number of clusters and therefore different final scaling factors compared to previous versions. The new behavior is considered more accurate.

dafeda · 2025-07-01T10:42:30Z

These tests timeout:

FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess0]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess2]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess3]@snake_oil_case_storage - AssertionError: value does not match the expected value in snapshot

I have updated the snapshots.

xjules

Not sure if @larsevj has any more comments, but it's 🚀 for my part

larsevj · 2025-07-04T07:26:12Z

Not sure if @larsevj has any more comments, but it's 🚀 for my part

Nothing to add from my side either. Looks good.

dafeda self-assigned this Jun 17, 2025

dafeda added the release-notes:breaking-change Automatically categorise as breaking change in release notes label Jun 17, 2025

dafeda added this to SCOUT Jun 17, 2025

dafeda moved this to Ready for Review in SCOUT Jun 17, 2025

dafeda changed the title ~~Test misfit corr groups~~ Correct PCA component counting logic Jun 17, 2025

larsevj reviewed Jun 17, 2025

View reviewed changes

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py Show resolved Hide resolved

larsevj reviewed Jun 17, 2025

View reviewed changes

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py Show resolved Hide resolved

larsevj reviewed Jun 17, 2025

View reviewed changes

dafeda force-pushed the test-misfit-corr-groups branch from 0aaf2c6 to 0982191 Compare June 18, 2025 06:39

larsevj reviewed Jun 18, 2025

View reviewed changes

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py Outdated Show resolved Hide resolved

dafeda force-pushed the test-misfit-corr-groups branch 2 times, most recently from 02e3cb6 to 50f34ad Compare June 18, 2025 13:12

xjules reviewed Jun 23, 2025

View reviewed changes

tests/ert/unit_tests/analysis/test_misfit_preprocessor.py Show resolved Hide resolved

xjules reviewed Jun 30, 2025

View reviewed changes

dafeda force-pushed the test-misfit-corr-groups branch from 50f34ad to 9358a5f Compare July 1, 2025 10:31

xjules approved these changes Jul 4, 2025

View reviewed changes

github-project-automation bot moved this from Ready for Review to Reviewed in SCOUT Jul 4, 2025

dafeda added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Jul 4, 2025

dafeda merged commit dd014dc into equinor:main Jul 4, 2025
35 checks passed

dafeda deleted the test-misfit-corr-groups branch July 4, 2025 07:36

github-project-automation bot moved this from Reviewed to Done in SCOUT Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct PCA component counting logic #11134

Correct PCA component counting logic #11134

Uh oh!

dafeda commented Jun 17, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

larsevj Jun 17, 2025

Uh oh!

dafeda Jun 17, 2025

Uh oh!

larsevj Jun 17, 2025 •

edited

Loading

Uh oh!

dafeda Jun 17, 2025

Uh oh!

Uh oh!

xjules Jun 23, 2025

Uh oh!

Uh oh!

xjules Jun 30, 2025

Uh oh!

dafeda Jun 30, 2025

Uh oh!

xjules commented Jul 1, 2025

Uh oh!

dafeda commented Jul 1, 2025

Uh oh!

xjules left a comment

Uh oh!

larsevj commented Jul 4, 2025

Uh oh!

Uh oh!

Uh oh!

Correct PCA component counting logic #11134

Correct PCA component counting logic #11134

Uh oh!

Conversation

dafeda commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

When applicable

Uh oh!

codspeed-hq bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #11134 will not alter performance

Summary

Uh oh!

Uh oh!

Uh oh!

larsevj Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

dafeda Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

larsevj Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dafeda Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xjules Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xjules Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

dafeda Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

xjules commented Jul 1, 2025

Uh oh!

dafeda commented Jul 1, 2025

Uh oh!

xjules left a comment

Choose a reason for hiding this comment

Uh oh!

larsevj commented Jul 4, 2025

Uh oh!

Uh oh!

Uh oh!

dafeda commented Jun 17, 2025 •

edited

Loading

codspeed-hq bot commented Jun 17, 2025 •

edited

Loading

larsevj Jun 17, 2025 •

edited

Loading