Skip to content

Conversation

@JulienBeg
Copy link
Contributor

@JulienBeg JulienBeg commented Oct 21, 2025

Here is some preliminary benchmark for non parametric MI estimation using k-NN.

It is based on the article "Mutual Information between Discrete and Continuous
Data Sets" from Brian C.Ross which adapts KSG to the setting where X is discrete and Y continuous.

I used cKDTree from scipy.special but we can probably find a better lib (e.g. hnswlib, faiss, mlpack, flann)

I suggest using an ensemble estimators with multiple values of k and retains the median to robustify the estimator.
Not sure this is a good idea though. Maybe we even want to take the max ? At least on the benchmark the ensemble methods seems better.

We can see that it works pretty well for MI larger than 10^-2 bits struggle when the MI becomes weaker.

I may have some other ideas to investigate.

One thing to look at is how to choose k or a list of k automatically for the user depending on the number of samples and class.

benchmark

@cla-bot cla-bot bot added the cla-signed label Oct 21, 2025
@JulienBeg
Copy link
Contributor Author

benchmark_5dummy_dimms

@JulienBeg
Copy link
Contributor Author

Same experiments where I added 5 dummy dimensions containing pure gaussian noise. The estimator still works but is considerably slower.

@rishubn
Copy link
Collaborator

rishubn commented Oct 22, 2025 via email

@JulienBeg
Copy link
Contributor Author

JulienBeg commented Oct 22, 2025

Hello @rishubn !

Both estimators are related and can be seen as variants of the original KSG estimators family (https://arxiv.org/pdf/cond-mat/0305641, Alexander Kraskov, Harald Stögbauer, and Peter Grassberger).

The KSG estimator has been tailored to compute the mutual information I(X;Y) where X,Y are continuous random variable admitting a density in R^{d_x} and R^{d_y} with respect to Lebesgue measure.

The article I looked at consider the setup when X is discrete and Y is continuous with a density in R^{d_y} which is the setup in side channel analysis.

The estimator suggested in the crypto paper (https://eprint.iacr.org/2022/1201) is based on the article https://arxiv.org/pdf/1709.06212 referred to as GKOV which considers a more generic settings where X and Y are mixtures of continuous and discrete random variable. For instance X is with probability 1/2 a binomial distribution and with probability 1/2 a normal distribution. It does not mean that either X or Y is discrete and the other is continuous. Both of them are a mix of the two cases. Using a distance on both the X and the Y space they manage to apply locally either the vanilla KSG or the pluging for discrete random variables.

I would say that GKOV is over-killed in the side channel setup.

In particular, we have to choose a distance on both X space and Y spaces which is comparable. In our setting it is not clear which metric would make sense for X. (For Y space any p-norm seems reasonable.)

In our setting where X is purely discrete and Y is purely continuous, when there are enough samples there a more than k+1 collisions for each X and the distance on X does not matter anymore. In this case it falls back to the estimator I implemented.

Another way to recover the estimator I implemented is to choose a "distance" on X defined by d_X(x,x') = 0 if x=x' and +\infty otherwise.

When there are not enough samples we can have less than k+1 collision for a given X and the metric on X then comes into play. Maybe we can use GKOV if we have very few samples and know a reasonable metric for the X space. Typically the Hamming distance could make sense (up to a normalization to make it comparable to the distance on the Y-space). But if the leakage depends not on the bit of X but on the bit of SBox(X) it is hard to justify why the Hamming distance would make sense here. To the best of my knowledge this has not been discussed yet and could be interesting. Furthermore, I tend to think that in this small sample regime the estimator will not be precise enough anyway.

So overall both amounts to the same in our setting. GKOV is more generic but it is over-killed here so I used Ross estimator whose presentation is simpler and exactly tailored for the scenario where X is discrete and Y continuous.

@JulienBeg
Copy link
Contributor Author

JulienBeg commented Oct 22, 2025

My explanation is not so brief but I hope it is clear =)

@rishubn
Copy link
Collaborator

rishubn commented Oct 22, 2025

Ok, makes sense, thanks for the clear explanation!

@rishubn
Copy link
Collaborator

rishubn commented Oct 22, 2025

And another question, have you experimented with larger word sizes? I saw in your benchmark you tested 8-bits

@JulienBeg
Copy link
Contributor Author

And another question, have you experimented with larger word sizes? I saw in your benchmark you tested 8-bits

I can test that but a limitation of the algorithm is that we need at least k+1 samples per class. Hence if we want to estimate the MI between X and Y where X is a n-bit variable we need at least (k+1) 2^n samples. With n=16, k = 10 it already 10^{5.81...} samples., with n=32,k =10, it is already prohibitively large (10^{10.63...} samples)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants