Question about reproducing results in the paper (LSTM on Shakespeare dataset) #8

twilightdema · 2020-10-22T10:36:10Z

Hello,
Thank you for the great work. I am studying federated learning in NLP. I tried to reproduce the results in the paper (mainly LSTM on Shakespeare dataset) but results seem very off from what it should be. Please help me recheck what I missed in my experiments.

(1) The Shakespeare data preprocessing is noted like below in the paper:

So I use the command like this to preprocess the data:

./preprocess.sh -s niid --sf 1.0 -k 0 -t sample -tf 0.8 -k 10000

(2) It is indicated the the paper that experiments were done with 1-Layer LSTM.

Anyway, reading from the code, I believe it is equal to setting:

NUM_LAYERS=3

As it will have one input layer, one output layer and one hidden LSTM layer (where the invariant permutation problem is addressed by FedMA)

(3) It is noted in the paper that FedAvg and FedProx awere trained with 33 communication rounds, while FedMA was trained with 11 communication rounds (because each round of FedMA requires 3 communication rounds correspoding to number of LSTM layers). I actually used 30 for FedAvg and FedProx and 10 for FedMA like these:

For FedAvg
python language_main.py --mode=fedavg --comm-round=30

For FedProx
python language_main.py --mode=fedprox --comm-round=30

For FedMA
python language_main.py --mode=fedma --comm-round=10
(I do not think --comm-round has any effect in FedMA anyway because the code perform single round of FedMA)
Then I performed the rest of FedMA communication round by running
python lstm_fedma_with_comm.py
(The lstm_fedma_with_comm.py has 10 communication rounds hard-coded)

(4) The results seem not aligned with what indicated in the paper. While FedProx got lower test accuracy than FedAvg, but FedMA also got lower accuracy than FedAvg too.

For FedAvg

For FedProx

For FedMA
Result from the first step (language_main.py)

Result from the second step (lstm_fedma_with_comm.py)

Results from the paper

Actually my FedAvg got substantially higher accuracy than in the paper. It reach 0.5 test accuracy while non of these 3 approachs reach such accuracy in the paper.
** I did not tune E (local training epoch) and use default value (5) but the results are still not align with indicated in the paper for E=5 anyway.

Thank you in advance for your help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about reproducing results in the paper (LSTM on Shakespeare dataset) #8

Question about reproducing results in the paper (LSTM on Shakespeare dataset) #8

twilightdema commented Oct 22, 2020

Question about reproducing results in the paper (LSTM on Shakespeare dataset) #8

Question about reproducing results in the paper (LSTM on Shakespeare dataset) #8

Comments

twilightdema commented Oct 22, 2020