What does epsilon do in code？It seems to have caused an overflow #2

yao9261 · 2021-11-09T05:20:19Z

Hi, I am a graduate student and I am very interested in your thesis. I have some difficulties when trying to run the code.

clustering.py line 162-165:

for i in range(n_clusters):
pop_clusters[i, 0] = i + 1
for client in np.where(clusters == i + 1)[0]:
pop_clusters[i, 1] += int(weights[client] * epsilon * n_sampled)

In the process of debugging, I found that some pop_clusters[i, 1] became a negative number after calculate, and I suspected that it might be overflow.
And I don’t understand what is the role of “epsilon” here. Could you help me understand it?

yao9261 · 2021-11-09T05:31:44Z

Labs-Federated-Learning-clustered_sampling\py_func\clustering.py:165: RuntimeWarning:
overflow encountered in long_scalars pop_clusters[i, 1] += int(weights[client] * epsilon * n_sampled)

It did overflow

YannFra · 2021-11-24T12:29:11Z

Hi yao9261, Thank you for you interest in this work.

Overflow issue. clustering.py was written for the experimental scenarios discussed in this paper. Is your issue obtained with one of these scenarios or a different one ? Please give me more details about the inputs of get_clusters_with_alg_2 leading to your error ( linkage_matrix, n_sampled, and weights). You are right pop_clusters[i, 1] is supposed to be non-negative.
Also, you can find in tests/test_clustering.py some tests for get_clusters_with_alg_2. They are all successfully passed.

epsilon. You are here discussing the implementation of Algorithm 2. In our work, we consider that the input is {n_i} while in get_clusters_with_alg_2 the input are the clients importance {p_i}, i.e. weights. epsilon is used to convert a client importance into an integer.

yao9261 · 2021-11-24T13:23:51Z

To begin with, thank you for your reply! That means a lot to me.
@YannFra

More details:
My running environment is the same as requirements.txt
I run FL.py by PyCharm with para:
dataset = "MNIST_shard"
sampling = "clustered_2"
sim_type = "cosine"
seed = 0
n_SGD = 10
lr = 0.01
decay = 1.0
p = 0.2
force = "True"
During debugging, some pop_clusters[i, 1] became a negative number.

Maybe a smaller epsilon can solve the problem?

But it’s okay, now I understand how it works here. And I think my error may caused by my running environment, Win10 and Pycharm.

I am trying to study how to improve FL with Non-IID data through clustering. Your article really inspired me a lot. Thank you very much for your reply again!

YannFra · 2021-11-25T07:32:51Z

I ran FL.py with the parameters you gave and the training went through.
Could you please display the error message you get ?
Is the server able to perform a couple of optimization rounds before you get your error message or is get_clusters_with_alg_2 unable to get clusters from the beginning of FL?

We have not been able to isolate your problem yet but epsilon should not be related to it.
Please let me know if there are any new developments.

Thank you for your positive feedback on this work. Feel free to contact me by e-mail (included in the paper) if you want to discuss the theoretical aspect of this work.

yao9261 · 2021-11-25T08:03:57Z

Overflow will not cause an error. It is just a warning.

Warning messege:
Labs-Federated-Learning-clustered_sampling\py_func\clustering.py:165: RuntimeWarning:
overflow encountered in long_scalars pop_clusters[i, 1] += int(weights[client] * epsilon * n_sampled)

Training can still be done after the overflow occurs, but the sorting and selection of clusters will be meaningless. At this time, cluster sampling cannot accelerate convergence.

When epsilon = 10^10, overflow occurs from the first round.
When epsilon = 10^5, overflow will not occur and accuracy increases faster.

There shouldn't be an overflow problem with int in python 3, and I am also very surprised why it overflows.
But from the debugging results, it is indeed an overflow that caused clusters sorting failure and slower convergence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does epsilon do in code？It seems to have caused an overflow #2

What does epsilon do in code？It seems to have caused an overflow #2

yao9261 commented Nov 9, 2021 •

edited

Loading

yao9261 commented Nov 9, 2021

YannFra commented Nov 24, 2021

yao9261 commented Nov 24, 2021

YannFra commented Nov 25, 2021 •

edited

Loading

yao9261 commented Nov 25, 2021 •

edited

Loading

What does epsilon do in code？It seems to have caused an overflow #2

What does epsilon do in code？It seems to have caused an overflow #2

Comments

yao9261 commented Nov 9, 2021 • edited Loading

yao9261 commented Nov 9, 2021

YannFra commented Nov 24, 2021

yao9261 commented Nov 24, 2021

YannFra commented Nov 25, 2021 • edited Loading

yao9261 commented Nov 25, 2021 • edited Loading

yao9261 commented Nov 9, 2021 •

edited

Loading

YannFra commented Nov 25, 2021 •

edited

Loading

yao9261 commented Nov 25, 2021 •

edited

Loading