Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity metric for LHS sampling #56

Open
njain76 opened this issue May 10, 2023 · 1 comment
Open

Similarity metric for LHS sampling #56

njain76 opened this issue May 10, 2023 · 1 comment

Comments

@njain76
Copy link

njain76 commented May 10, 2023

Hi Cedric,

In the IEC 62209-3, a similarity metric is given as follows:

image

The LHS binning is required to be as per the above metric. Is this metric evaluated in our software somewhere or can be visualized to understand if the sampling is correct?

Regards,
Nitin

@cbujard
Copy link
Collaborator

cbujard commented May 10, 2023

Hi @njain76 ,

This metric is not evaluated anywhere in our software. The variogram based approach supersedes this "tentative metric" as it represents the uncertainty as a function of distance in a transformed version of the parameter space (includes all components used in this formula): a variogram is actually an easier visualization. You may visualize such a product of normal pdf is as a single multivariate normal pdf, and generally we know that the product of normal pdf is itself a normal pdf.

LHS binning involves x-variables only (the 8 dimensions mentionned here: x, y, f, d, theta, bw, papr, P). The way to check a sample is properly LHS distributed is to project it on all of the 8 dimensions and check the evenness and uniformity of these projections (you need to plot each columns of the csv separately and check they fit within n bins for n the sample size, you just have to take into consideration all values you see have been snapped to the closest meaningful combination of parameters, which is actually difficult to retro-engineer properly). In addition our method requires more than that for the testing phase (step 2) as we also need each element within each bin to be randomly distributed. So in the end an approximate verification is to look at the projections (you already have that in the sample plot in the current version of the software), but since the true LHS values are lost in the process it's very difficult to check this precisely. Note that an advantage of the training part of our method (step 1) is that it is very insensitive to the LHS-quality of the sample. Only step 2 requires a good local randomness around measured points, this randomness has a direct impact on the alignment of points in the QQ-plot (hence a bad quality test sample would be a way to fail the testing phase).

I hope this answers your questions.

Best,
Cedric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants