Skip to content

Commit 2ef8515

Browse files
committed
[revision] fix grammer, and prepare to release.
1 parent 57c32af commit 2ef8515

File tree

7 files changed

+71
-65
lines changed

7 files changed

+71
-65
lines changed

README.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
[![GH Actions Tests](https://github.com/SMILELab-FL/FedLab/actions/workflows/CI.yml/badge.svg)](https://github.com/SMILELab-FL/FedLab/actions) [![Documentation Status](https://readthedocs.org/projects/fedlab/badge/?version=master)](https://fedlab.readthedocs.io/en/master/?badge=master) [![License](https://img.shields.io/github/license/SMILELab-FL/FedLab)](https://opensource.org/licenses/Apache-2.0) [![codecov](https://codecov.io/gh/SMILELab-FL/FedLab/branch/master/graph/badge.svg?token=4HHB5JCSC6)](https://codecov.io/gh/SMILELab-FL/FedLab) [![arXiv](https://img.shields.io/badge/arXiv-2107.11621-red.svg)](https://arxiv.org/abs/2107.11621) [![Pyversions](https://img.shields.io/pypi/pyversions/fedlab.svg?style=flat-square)](https://pypi.python.org/pypi/fedlab)
66

77

8-
Federated learning (FL), proposed by Google at the very beginning, is recently a burgeoning research area of machine learning, which aims to protect individual data privacy in distributed machine learning process, especially in finance, smart healthcare and edge computing. Different from traditional data-centered distributed machine learning, participants in FL setting utilize localized data to train local model, then leverages specific strategies with other participants to acquire the final model collaboratively, avoiding direct data sharing behavior.
8+
Federated learning (FL), proposed by Google at the very beginning, is recently a burgeoning research area of machine learning, which aims to protect individual data privacy in the distributed machine learning processes, especially in finance, smart healthcare, and edge computing. Different from traditional data-centered distributed machine learning, participants in the FL setting utilize localized data to train local models, then leverages specific strategies with other participants to acquire the final model collaboratively, avoiding direct data-sharing behavior.
99

10-
To relieve the burden of researchers in implementing FL algorithms and emancipate FL scientists from repetitive implementation of basic FL setting, we introduce highly customizable framework __FedLab__ in this work. __FedLab__ provides the necessary modules for FL simulation, including ***communication***, ***compression***, ***model optimization***, ***data partition*** and other ***functional modules***. Users can build FL simulation environment with custom modules like playing with LEGO bricks. For better understanding and easy usage, FL algorithm benchmark implemented in __FedLab__ are also presented.
10+
To relieve the burden of researchers in implementing FL algorithms and emancipate FL scientists from the repetitive implementation of basic FL settings, we introduce a highly customizable framework __FedLab__ in this work. __FedLab__ provides the necessary modules for FL simulation, including ***communication***, ***compression***, ***model optimization***, ***data partition*** and other ***functional modules***. Users can build an FL simulation environment with custom modules like playing with LEGO bricks. For better understanding and easy usage, the FL baseline algorithms implemented via __FedLab__ are also presented.
1111

1212

1313
## Quick start
@@ -29,7 +29,7 @@ $ pip install fedlab
2929

3030
### Learning materials
3131

32-
We provide tutorials in jupyter notebook format for FedLab beginners in FedLab\tutorials. These tutorials include data partition, customize algorithms, and pipeline demos. For the FedLab or FL beginners, we recommend this [notebook](tutorials/pipeline_tutorial.ipynb). Furthermore, we provide reproductions of federated algorithms via FedLab, which are stored in fedlab.contirb.algorithm. We think they are good examples for users to further explore FedLab.
32+
We provide tutorials in jupyter notebook format for FedLab beginners in FedLab\tutorials. These tutorials include data partition, customized algorithms, and pipeline demos. For the FedLab or FL beginners, we recommend this [notebook](tutorials/pipeline_tutorial.ipynb). Furthermore, we provide reproductions of federated algorithms via FedLab, which are stored in fedlab.contirb.algorithm. We think they are good examples for users to further explore FedLab.
3333

3434
[Website Documentations](https://fedlab.readthedocs.io/en/master/) are availiable:
3535

@@ -42,7 +42,7 @@ We provide tutorials in jupyter notebook format for FedLab beginners in FedLab\t
4242

4343
### Run Examples
4444

45-
- Run our quick start examples of different scenarios with partitioned MNIST dataset.
45+
- Run our quick start examples of different scenarios with a partitioned MNIST dataset.
4646

4747
```
4848
# example of standalone
@@ -51,7 +51,7 @@ $ python standalone.py --total_client 100 --com_round 3 --sample_ratio 0.1 --bat
5151
```
5252

5353
## Architecture
54-
Files architecture of FedLab. These content may be helpful for users to understand our repo.
54+
Files architecture of FedLab. These contents may be helpful for users to understand our repo.
5555

5656
```
5757
├── fedlab
@@ -60,7 +60,7 @@ Files architecture of FedLab. These content may be helpful for users to understa
6060
│ ├── models
6161
│ └── utils
6262
├── datasets
63-
│ └──...
63+
│ └── ...
6464
├── examples
6565
│ ├── asynchronous-cross-process-mnist
6666
│ ├── cross-process-mnist
@@ -71,7 +71,8 @@ Files architecture of FedLab. These content may be helpful for users to understa
7171
└── tutorials
7272
├── communication_tutorial.ipynb
7373
├── customize_tutorial.ipynb
74-
└── pipeline_tutorial.ipynb
74+
├── pipeline_tutorial.ipynb
75+
└── ...
7576
```
7677

7778
## Baselines
@@ -96,7 +97,7 @@ We provide the reproduction of baseline federated algorthms for users in this re
9697
| ... | | | | |
9798
## Datasets & Data Partition
9899

99-
Sophisticated in real world, FL need to handle various kind of data distribution scenarios, including iid and non-iid scenarios. Though there already exists some datasets and partition schemes for published data benchmark, it still can be very messy and hard for researchers to partition datasets according to their specific research problems, and maintain partition results during simulation. __FedLab__ provides [`fedlab.utils.dataset.partition.DataPartitioner`](https://fedlab.readthedocs.io/en/master/autoapi/fedlab/utils/dataset/partition/index.html#fedlab.utils.dataset.partition.DataPartitioner) that allows you to use pre-partitioned datasets as well as your own data. `DataPartitioner` stores sample indices for each client given a data partition scheme. Also, FedLab provides some extra datasets that are used in current FL researches while not provided by official PyTorch `torchvision.datasets` yet.
100+
Sophisticated in the real world, FL needs to handle various kind of data distribution scenarios, including iid and non-iid scenarios. Though there already exists some datasets and partition schemes for published data benchmark, it still can be very messy and hard for researchers to partition datasets according to their specific research problems, and maintain partition results during simulation. __FedLab__ provides [`fedlab.utils.dataset.partition.DataPartitioner`](https://fedlab.readthedocs.io/en/master/autoapi/fedlab/utils/dataset/partition/index.html#fedlab.utils.dataset.partition.DataPartitioner) that allows you to use pre-partitioned datasets as well as your own data. `DataPartitioner` stores sample indices for each client given a data partition scheme. Also, FedLab provides some extra datasets that are used in current FL researches while not provided by official PyTorch `torchvision.datasets` yet.
100101

101102
### Data Partition
102103

@@ -276,11 +277,11 @@ Non-iid partition used in [[1]](#1). Data example for 4 clients could be shown a
276277

277278
## Performance & Insights
278279

279-
We provide the performance report of several reproduced federated learning algorithms to illustrate the correctness of FedLab in simulation. Furthermore, we describe several insights FedLab could provide for federated learning research. Without loss of generality, this section's experiments are conducted on partitioned mnist datasets. The conclusions and observations in this section should still be valid in other data sets and scenarios.
280+
We provide the performance report of several reproduced federated learning algorithms to illustrate the correctness of FedLab in simulation. Furthermore, we describe several insights FedLab could provide for federated learning research. Without loss of generality, this section's experiments are conducted on partitioned MNIST datasets. The conclusions and observations in this section should still be valid in other data sets and scenarios.
280281

281282
### Federated Optimization on Non-IID Data
282283

283-
We choose $\alpha = [0.1, 0.3, 0.5, 0.7]$ in label Dirichlet partitioned mnist with 100 clients. We run 200 rounds of FedAvg with 5 local batchs with full batch, learning rate 0.1 and sample ratio 0.1 (10 clients for each FL round). The test accuracy over communication round is shown below. The results reveal the most vital challenge in federated learning.
284+
We choose $\alpha = [0.1, 0.3, 0.5, 0.7]$ in label Dirichlet partitioned MNIST with 100 clients. We run 200 rounds of FedAvg with 5 local batches with full batch, learning rate 0.1, and sample ratio 0.1 (10 clients for each FL round). The test accuracy over the communication round is shown below. The results reveal the most vital challenge in federated learning.
284285

285286
<p align="center"><img src="./examples/imgs/non_iid_impacts_on_fedavg.jpg" height="300"></p>
286287

@@ -337,9 +338,9 @@ We use the same partitioned MNIST dataset in FedAvg[[4]](#4) to evaluate the cor
337338

338339
### Computation Efficienty
339340

340-
Time cost in 100 rounds (50 clients are sampled per round) under different acceleration settings. 1M-10P stands for the simulation runs on 1 machine with 4 GPUs and 10 processes. 2M-10P stands for the simulation runs on 2 machine with 4 GPUs and 10 processes (5 processes on each machine).
341+
Time cost in 100 rounds (50 clients are sampled per round) under different acceleration settings. 1M-10P stands for the simulation runs on 1 machine with 4 GPUs and 10 processes. 2M-10P stands for the simulation runs on 2 machines with 4 GPUs and 10 processes (5 processes on each machine).
341342

342-
Hardware: Intel(R) Xeon(R) Gold 6240L CPU @ 2.60GHz + Tesla V100 * 4.
343+
Hardware platform: Intel(R) Xeon(R) Gold 6240L CPU @ 2.60GHz + Tesla V100 * 4.
343344

344345
| Standalone | Cross-process 1M-10P | Cross-process 2M-10P |
345346
| ---------- | ------------------------- | --------------------------- |
@@ -349,7 +350,7 @@ Hardware: Intel(R) Xeon(R) Gold 6240L CPU @ 2.60GHz + Tesla V100 * 4.
349350

350351
### Communication Efficiency
351352

352-
We provide a few performance baselines in communication-efficient federated learning including QSGD and top-k. In the experiment setting, we choose $\alpha = 0.5$ in label Dirichlet partitioned mnist with 100 clients. We run 200 rounds with sample ratio 0.1 (10 clients for each FL round) of FedAvg, where each client performs 5 local epoches of SGD with full batch and learning rate 0.1. We report the top-1 test accuracy and its communication volume during the training.
353+
We provide a few performance baselines in communication-efficient federated learning including QSGD and top-k. In the experiment setting, we choose $\alpha = 0.5$ in the label Dirichlet partitioned MNIST with 100 clients. We run 200 rounds with a sample ratio of 0.1 (10 clients for each FL round) of FedAvg, where each client performs 5 local epochs of SGD with a full batch and learning rate of 0.1. We report the top-1 test accuracy and its communication volume during the training.
353354

354355
| Setting | Baseline | QSGD-4bit | QSGD-8bit | QSGD-16bit | top-5% | Top-10% | Top-20% |
355356
| -------------------- | -------- | --------- | --------- | ---------- | ------ | ------- | ------- |

fedlab/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
__version__ = "1.3.0_alpha"
15+
__version__ = "1.3.0"

fedlab/contrib/algorithm/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
2+
3+
from .basic_client import SGDClientTrainer, SGDSerialClientTrainer
4+
from .basic_server import SyncServerHandler, AsyncServerHandler
5+
6+
from .ditto import DittoSerialClientTrainer, DittoServerHandler
7+
from .fedavg import FedAvgSerialClientTrainer, FedAvgServerHandler
8+
from .feddyn import FedDynSerialClientTrainer, FedDynServerHandler
9+
from .fednova import FedNovaSerialClientTrainer, FedNovaServerHandler
10+
from .fedprox import FedProxSerialClientTrainer, FedProxClientTrainer, FedProxServerHandler
11+
from .ifca import IFCASerialClientTrainer, IFCAServerHander
12+
from .powerofchoice import PowerofchoiceSerialClientTrainer, PowerofchoicePipeline, Powerofchoice
13+
from .qfedavg import qFedAvgClientTrainer, qFedAvgServerHandler
14+
from .scaffold import ScaffoldSerialClientTrainer, ScaffoldServerHandler

fedlab/contrib/compressor/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,6 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14+
15+
from .quantization import QSGDCompressor
16+
from .topk import TopkCompressor

fedlab/contrib/dataset/__init__.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,12 @@
1313
# limitations under the License.
1414

1515
from .basic_dataset import FedDataset, BaseDataset, Subset
16+
from .fcube import FCUBE
17+
from .covtype import Covtype
18+
from .rcv1 import RCV1
19+
1620
from .pathological_mnist import PathologicalMNIST
1721
from .rotated_mnist import RotatedMNIST
1822
from .rotated_cifar10 import RotatedCIFAR10
19-
from .partitioned_cifar import PartitionCIFAR
2023
from .partitioned_mnist import PartitionedMNIST
21-
from .fcube import FCUBE
22-
from .covtype import Covtype
23-
from .rcv1 import RCV1
24+
from .partitioned_cifar10 import PartitionedCIFAR10

fedlab/contrib/dataset/partitioned_cifar.py renamed to fedlab/contrib/dataset/partitioned_cifar10.py

Lines changed: 33 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,12 @@
1717
import torch
1818
from torch.utils.data import DataLoader
1919
import torchvision
20-
from torchvision import transforms
2120

22-
from .basic_dataset import FedDataset, CIFARSubset
23-
from ...utils.dataset.partition import CIFAR10Partitioner, CIFAR100Partitioner, MNISTPartitioner
21+
from .basic_dataset import FedDataset, BaseDataset
22+
from ...utils.dataset.partition import CIFAR10Partitioner
2423

2524

26-
class PartitionCIFAR(FedDataset):
25+
class PartitionedCIFAR10(FedDataset):
2726
""":class:`FedDataset` with partitioning preprocess. For detailed partitioning, please
2827
check `Federated Dataset and DataPartitioner <https://fedlab.readthedocs.io/en/master/tutorials/dataset_partition.html>`_.
2928
@@ -99,48 +98,36 @@ def preprocess(self,
9998
os.mkdir(os.path.join(self.path, "var"))
10099
os.mkdir(os.path.join(self.path, "test"))
101100
# train dataset partitioning
102-
if self.dataname == 'cifar10':
103-
trainset = torchvision.datasets.CIFAR10(root=self.root,
104-
train=True,
105-
download=self.download)
106-
partitioner = CIFAR10Partitioner(trainset.targets,
107-
self.num_clients,
108-
balance=balance,
109-
partition=partition,
110-
unbalance_sgm=unbalance_sgm,
111-
num_shards=num_shards,
112-
dir_alpha=dir_alpha,
113-
verbose=verbose,
114-
seed=seed)
115-
elif self.dataname == 'cifar100':
116-
trainset = torchvision.datasets.CIFAR100(root=self.root,
117-
train=True,
118-
download=self.download)
119-
partitioner = CIFAR100Partitioner(trainset.targets,
120-
self.num_clients,
121-
balance=balance,
122-
partition=partition,
123-
unbalance_sgm=unbalance_sgm,
124-
num_shards=num_shards,
125-
dir_alpha=dir_alpha,
126-
verbose=verbose,
127-
seed=seed)
128-
else:
129-
raise ValueError(
130-
f"'dataname'={self.dataname} currently is not supported. Only 'cifar10', and 'cifar100' are supported."
131-
)
132-
133-
subsets = {
134-
cid: CIFARSubset(trainset,
135-
partitioner.client_dict[cid],
136-
transform=self.transform,
137-
target_transform=self.targt_transform)
138-
for cid in range(self.num_clients)
139-
}
140-
for cid in subsets:
101+
trainset = torchvision.datasets.CIFAR10(root=self.root,
102+
train=True,
103+
transform=self.transform,
104+
download=self.download)
105+
partitioner = CIFAR10Partitioner(trainset.targets,
106+
self.num_clients,
107+
balance=balance,
108+
partition=partition,
109+
unbalance_sgm=unbalance_sgm,
110+
num_shards=num_shards,
111+
dir_alpha=dir_alpha,
112+
verbose=verbose,
113+
seed=seed)
114+
115+
self.data_indices = partitioner.client_dict
116+
117+
samples, labels = [], []
118+
for x, y in trainset:
119+
samples.append(x)
120+
labels.append(y)
121+
for id, indices in self.data_indices.items():
122+
data, label = [], []
123+
for idx in indices:
124+
x, y = samples[idx], labels[idx]
125+
data.append(x)
126+
label.append(y)
127+
dataset = BaseDataset(data, label)
141128
torch.save(
142-
subsets[cid],
143-
os.path.join(self.path, "train", "data{}.pkl".format(cid)))
129+
dataset,
130+
os.path.join(self.path, "train", "data{}.pkl".format(id)))
144131

145132
def get_dataset(self, cid, type="train"):
146133
"""Load subdataset for client with client ID ``cid`` from local file.
@@ -166,5 +153,5 @@ def get_dataloader(self, cid, batch_size=None, type="train"):
166153
"""
167154
dataset = self.get_dataset(cid, type)
168155
batch_size = len(dataset) if batch_size is None else batch_size
169-
data_loader = DataLoader(dataset, batch_size=batch_size)
156+
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
170157
return data_loader

fedlab/core/standalone.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,4 @@ def main(self):
4848
# self.handler.evaluate()
4949

5050
def evaluate(self):
51-
print("Implement your evaluation here.")
51+
print("This is a example implementation. Please read the source code at fedlab.core.standalone.")

0 commit comments

Comments
 (0)