Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KL divergence test for missing data #2

Open
vcabeli opened this issue Jul 23, 2020 · 0 comments
Open

KL divergence test for missing data #2

vcabeli opened this issue Jul 23, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@vcabeli
Copy link
Collaborator

vcabeli commented Jul 23, 2020

I’ll probably have to fix this one, just taking notes here

We use the KL divergence for testing whether the joint distribution of (X,Y) on the samples for which the contributor Z is not NA P(XY)|Z_notNA is not too different from the original P(XY).

If it is very different, then the result of I(X;Y|Z) does not really give information about Z as a contributor, see this extreme example :
image

For now the value KL(P(XY)|Z_notNA, P(XY)) is compared to log(N_nonNA) which probably captures the worst cases of selection bias but may not be what we want.

One obvious flaw is that log(N_nonNA) is increasing, whereas we expect it to be harder to create a strong selection bias when adding more samples to the subsample.
image

In this image the blue distribution are empirical distributions of 10K KL divs for random subsampling (null hypothesis) and the red line is log(N_nonNA) (grey number to the right) along with its empirical pvalue.

The threshold should be defined on a pvalue agasint the null (what can we expect from the null distribution, i.e. if data were really missing at random?), probably relative to either I(X;Y) (or H(X,Y)?) : if I(X;Y) is already very low it may be a good idea to be very strict about the value of KL.

It may be considered as a special case of the two-sample test (many tests require the two samples to be independent)

@louise-rb-dupuis louise-rb-dupuis added the enhancement New feature or request label Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants