Skip to content

Conversation

dafeda
Copy link
Contributor

@dafeda dafeda commented Jun 17, 2025

Results from running ES on Drogon using old and new way of counting principal components:

image

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'just rapid-tests')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Add backport label to latest release (format: 'backport release-branch-name')

@dafeda dafeda self-assigned this Jun 17, 2025
@dafeda dafeda added the release-notes:breaking-change Automatically categorise as breaking change in release notes label Jun 17, 2025
@dafeda dafeda added this to SCOUT Jun 17, 2025
@dafeda dafeda moved this to Ready for Review in SCOUT Jun 17, 2025
@dafeda dafeda changed the title Test misfit corr groups Correct PCA component counting logic Jun 17, 2025
Copy link

codspeed-hq bot commented Jun 17, 2025

CodSpeed Performance Report

Merging #11134 will not alter performance

Comparing dafeda:test-misfit-corr-groups (9358a5f) with main (c668568)

Summary

✅ 22 untouched benchmarks


# Create Group A: `nr_obs_group_a` perfectly correlated
# responses based on `params_a`
for i in range(nr_obs_group_a):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if these are "perfectly negatively correlated" instead? I suspect one will not necessarily get 2 groups then, but what is reasonable in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you saying something about this, but don't recall the details.
Why do you suspect different results with negative correlations?

Copy link
Collaborator

@larsevj larsevj Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have observation a,b,c with correlations:

$$ \begin{bmatrix} 1 & 0.9 & -0.9 \\ 0.9 & 1 & 0.1 \\ -0.9 & 0.1 & 1 \end{bmatrix} $$

you would with our method get distances ab: 1.01, ac: 2.8 and bc: 2.2.
Whereas if you where to consider positive and negative correlations as equal (take absolute value) you would get:
ab: 0.812, ac: 0.812 and bc: 1.27.
Now whether we want to consider positive and negative correlations as equal is up for question i guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Nice example.

Here's a trivial example that shows that perfectly negatively correlated responses are treated as being far apart, while perfectly positively correlated responses are treated as being close.

corr_matrix = np.array([
    [ 1.0, -1.0],
    [-1.0,  1.0]
])

Z = linkage(corr_matrix, "average", "euclidean")
print("Distance perfect negative correlation: ", Z[0, 2])

corr_matrix = np.array([
    [ 1.0, 1.0],
    [1.0,  1.0]
])

Z = linkage(corr_matrix, "average", "euclidean")
print("Distance perfect positive correlation: ", Z[0, 2])
Distance perfect negative correlation:  2.8284271247461903
Distance perfect positive correlation:  0.0

@dafeda dafeda force-pushed the test-misfit-corr-groups branch from 0aaf2c6 to 0982191 Compare June 18, 2025 06:39
@dafeda dafeda force-pushed the test-misfit-corr-groups branch 2 times, most recently from 02e3cb6 to 50f34ad Compare June 18, 2025 13:12
@@ -45,7 +54,10 @@ def get_nr_primary_components(
# sum to get the cumulative proportion of variance explained by each successive
# component.
variance_ratio = np.cumsum(singulars**2) / np.sum(singulars**2)
return max(len([1 for i in variance_ratio[:-1] if i < threshold]), 1)

num_components = np.searchsorted(variance_ratio, threshold, side="left") + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one!

# Create Group B: (b+j)*(-1)^j*X_2 pattern - checkerboard correlation
b = 1 # base scaling factor for group B
for j in range(nr_obs_group_b):
sign = (-1) ** j # alternates: +1, -1, +1, -1, ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe more efficient: sign = -1 if j % 2 else 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modulo version is approx 1 second faster with 10 million groups. The number of groups in this test is maximum 6.

import time

n = 10_000_000

# Test exponentiation
start = time.perf_counter()
for j in range(n):
   sign = (-1) ** j
time_exp = time.perf_counter() - start

# Test modulo
start = time.perf_counter()
for j in range(n):
   sign = -1 if j % 2 else 1
time_mod = time.perf_counter() - start

print(f"Exponentiation: {time_exp:.4f}s")
print(f"Modulo:         {time_mod:.4f}s")
print(f"Winner: {'Exponentiation' if time_exp < time_mod else 'Modulo'}")
Exponentiation: 1.5748s
Modulo:         0.6303s
Winner: Modulo

@xjules
Copy link
Contributor

xjules commented Jul 1, 2025

These tests timeout:

FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess0]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess2]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess3]@snake_oil_case_storage - AssertionError: value does not match the expected value in snapshot 

BREAKING CHANGE:
The change in component counting directly affects the number of clusters requested in the `main` auto-scaling function.
For the same input data, this version may produce a different number of clusters and therefore different final scaling factors compared to previous versions.
The new behavior is considered more accurate.
@dafeda dafeda force-pushed the test-misfit-corr-groups branch from 50f34ad to 9358a5f Compare July 1, 2025 10:31
@dafeda
Copy link
Contributor Author

dafeda commented Jul 1, 2025

These tests timeout:

FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess0]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess2]@snake_oil_case_storage - Failed: Timeout (>360.0s) from pytest-timeout.
FAILED tests/ert/unit_tests/analysis/test_es_update.py::test_update_report[0-misfit_preprocess3]@snake_oil_case_storage - AssertionError: value does not match the expected value in snapshot 

I have updated the snapshots.

Copy link
Contributor

@xjules xjules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if @larsevj has any more comments, but it's 🚀 for my part

@github-project-automation github-project-automation bot moved this from Ready for Review to Reviewed in SCOUT Jul 4, 2025
@larsevj
Copy link
Collaborator

larsevj commented Jul 4, 2025

Not sure if @larsevj has any more comments, but it's 🚀 for my part

Nothing to add from my side either. Looks good.

@dafeda dafeda added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Jul 4, 2025
@dafeda dafeda merged commit dd014dc into equinor:main Jul 4, 2025
35 checks passed
@dafeda dafeda deleted the test-misfit-corr-groups branch July 4, 2025 07:36
@github-project-automation github-project-automation bot moved this from Reviewed to Done in SCOUT Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:breaking-change Automatically categorise as breaking change in release notes release-notes:bug-fix Automatically categorise as bug fix in release notes
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants