-
Notifications
You must be signed in to change notification settings - Fork 25
[DRAFT] knn based mi estimation #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Same experiments where I added 5 dummy dimensions containing pure gaussian noise. The estimator still works but is considerably slower. |
|
Hi,
Could you briefly explain how this estimator differs from the one suggested
by this crypto 24 paper:
https://eprint.iacr.org/2022/1201
I’m not an expert but just curious
Thanks
…On Wed, Oct 22, 2025 at 10:33 AM JulienBeg ***@***.***> wrote:
*JulienBeg* left a comment (simple-crypto/SCALib#207)
<#207 (comment)>
Same experiments where I added 5 dummy dimensions containing pure gaussian
noise. The estimator still works but is considerably slower.
—
Reply to this email directly, view it on GitHub
<#207 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABSJFIR76PKAR2SE55GWZSL3Y46MXAVCNFSM6AAAAACJ2STKQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMZRGA4DMOBXGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
|
Hello @rishubn ! Both estimators are related and can be seen as variants of the original KSG estimators family (https://arxiv.org/pdf/cond-mat/0305641, Alexander Kraskov, Harald Stögbauer, and Peter Grassberger). The KSG estimator has been tailored to compute the mutual information I(X;Y) where X,Y are continuous random variable admitting a density in R^{d_x} and R^{d_y} with respect to Lebesgue measure. The article I looked at consider the setup when X is discrete and Y is continuous with a density in R^{d_y} which is the setup in side channel analysis. The estimator suggested in the crypto paper (https://eprint.iacr.org/2022/1201) is based on the article https://arxiv.org/pdf/1709.06212 referred to as GKOV which considers a more generic settings where X and Y are mixtures of continuous and discrete random variable. For instance X is with probability 1/2 a binomial distribution and with probability 1/2 a normal distribution. It does not mean that either X or Y is discrete and the other is continuous. Both of them are a mix of the two cases. Using a distance on both the X and the Y space they manage to apply locally either the vanilla KSG or the pluging for discrete random variables. I would say that GKOV is over-killed in the side channel setup. In particular, we have to choose a distance on both X space and Y spaces which is comparable. In our setting it is not clear which metric would make sense for X. (For Y space any p-norm seems reasonable.) In our setting where X is purely discrete and Y is purely continuous, when there are enough samples there a more than k+1 collisions for each X and the distance on X does not matter anymore. In this case it falls back to the estimator I implemented. Another way to recover the estimator I implemented is to choose a "distance" on X defined by d_X(x,x') = 0 if x=x' and +\infty otherwise. When there are not enough samples we can have less than k+1 collision for a given X and the metric on X then comes into play. Maybe we can use GKOV if we have very few samples and know a reasonable metric for the X space. Typically the Hamming distance could make sense (up to a normalization to make it comparable to the distance on the Y-space). But if the leakage depends not on the bit of X but on the bit of SBox(X) it is hard to justify why the Hamming distance would make sense here. To the best of my knowledge this has not been discussed yet and could be interesting. Furthermore, I tend to think that in this small sample regime the estimator will not be precise enough anyway. So overall both amounts to the same in our setting. GKOV is more generic but it is over-killed here so I used Ross estimator whose presentation is simpler and exactly tailored for the scenario where X is discrete and Y continuous. |
|
My explanation is not so brief but I hope it is clear =) |
|
Ok, makes sense, thanks for the clear explanation! |
|
And another question, have you experimented with larger word sizes? I saw in your benchmark you tested 8-bits |
I can test that but a limitation of the algorithm is that we need at least k+1 samples per class. Hence if we want to estimate the MI between X and Y where X is a n-bit variable we need at least (k+1) 2^n samples. With n=16, k = 10 it already 10^{5.81...} samples., with n=32,k =10, it is already prohibitively large (10^{10.63...} samples) |

Here is some preliminary benchmark for non parametric MI estimation using k-NN.
It is based on the article "Mutual Information between Discrete and Continuous
Data Sets" from Brian C.Ross which adapts KSG to the setting where X is discrete and Y continuous.
I used cKDTree from scipy.special but we can probably find a better lib (e.g. hnswlib, faiss, mlpack, flann)
I suggest using an ensemble estimators with multiple values of k and retains the median to robustify the estimator.
Not sure this is a good idea though. Maybe we even want to take the max ? At least on the benchmark the ensemble methods seems better.
We can see that it works pretty well for MI larger than 10^-2 bits struggle when the MI becomes weaker.
I may have some other ideas to investigate.
One thing to look at is how to choose k or a list of k automatically for the user depending on the number of samples and class.