Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about reproducing results in the paper (LSTM on Shakespeare dataset) #8

Open
twilightdema opened this issue Oct 22, 2020 · 0 comments

Comments

@twilightdema
Copy link

Hello,
Thank you for the great work. I am studying federated learning in NLP. I tried to reproduce the results in the paper (mainly LSTM on Shakespeare dataset) but results seem very off from what it should be. Please help me recheck what I missed in my experiments.

(1) The Shakespeare data preprocessing is noted like below in the paper:

Screenshot from 2020-10-22 17-16-29

So I use the command like this to preprocess the data:

./preprocess.sh -s niid --sf 1.0 -k 0 -t sample -tf 0.8 -k 10000

(2) It is indicated the the paper that experiments were done with 1-Layer LSTM.
image

Anyway, reading from the code, I believe it is equal to setting:

NUM_LAYERS=3

As it will have one input layer, one output layer and one hidden LSTM layer (where the invariant permutation problem is addressed by FedMA)

(3) It is noted in the paper that FedAvg and FedProx awere trained with 33 communication rounds, while FedMA was trained with 11 communication rounds (because each round of FedMA requires 3 communication rounds correspoding to number of LSTM layers). I actually used 30 for FedAvg and FedProx and 10 for FedMA like these:

For FedAvg
python language_main.py --mode=fedavg --comm-round=30

For FedProx
python language_main.py --mode=fedprox --comm-round=30

For FedMA
python language_main.py --mode=fedma --comm-round=10
(I do not think --comm-round has any effect in FedMA anyway because the code perform single round of FedMA)
Then I performed the rest of FedMA communication round by running
python lstm_fedma_with_comm.py
(The lstm_fedma_with_comm.py has 10 communication rounds hard-coded)

(4) The results seem not aligned with what indicated in the paper. While FedProx got lower test accuracy than FedAvg, but FedMA also got lower accuracy than FedAvg too.

For FedAvg
image

For FedProx
image

For FedMA
Result from the first step (language_main.py)
image

Result from the second step (lstm_fedma_with_comm.py)
image

Results from the paper
image

  • Actually my FedAvg got substantially higher accuracy than in the paper. It reach 0.5 test accuracy while non of these 3 approachs reach such accuracy in the paper.
    ** I did not tune E (local training epoch) and use default value (5) but the results are still not align with indicated in the paper for E=5 anyway.
    image

Thank you in advance for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant