FederatedScope-LLM (FS-LLM) is a comprehensive package for federated fine-tuning large language models, which provide:
- A complete end-to-end benchmarking pipeline, automizing the processes of dataset preprocessing, federated fine-tuning execution or simulation, and performance evaluation on federated LLM fine-tuning with different capability demonstration purposes;
- Comprehensive and off-the-shelf federated fine-tuning algorithm implementations and versatile programming interfaces for future extension to enhance the capabilities of LLMs in FL scenarios with low communication and computation costs, even without accessing the full model (e.g., closed-source LLMs);
- Several accelerating operators and resource-efficient operators for fine-tuning LLMs with limited resources and the flexible pluggable sub-routines for interdisciplinary study (e.g., LLMs in personalized FL).
For more details, please refer to our paper: FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model.
We provide a hands-on tutorial here for your quick start.
FederatedScope
βββ federatedscope
βΒ Β βββ core # Federated learning backend modules
βΒ Β βββ llm # Federated fine-tuning LLMs modules
βΒ Β β βββ baseline # Scripts for LLMs
βΒ Β β βββ dataloader # Federated fine-tuning dataloader
βΒ Β β βββ dataset # Federated fine-tuning dataset
βΒ Β β βββ eval # Evaluation for fine-tuned LLMs
βΒ Β β βββ misc # Miscellaneous
βΒ Β β βββ model # LLMs and Adapter
βΒ Β β βββ trainer # Fine-tuning with accerating operators
βΒ Β β βββ ...
βΒ Β βββ main.py # Running interface
βΒ Β βββ ... ...
βββ tests # Unittest modules for continuous integration
βββ LICENSE
βββ setup.py
Letβs start with fine-tuning GPT-2 on Alpaca to familiarize you with FS-LLM.
The installation of FS-LLM is similar to minimal FS (see here for details), except that it requires Pytorch>=1.13.0 (we recommend version 2.0.X) because of the PEFT dependency:
# Create virtual environments with conda
conda create -n fs-llm python=3.9
conda activate fs-llm
# Install Pytorch>=1.13.0 (e.g., Pytorch==2.0.0)
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
# Install FS-LLM with editable mode
pip install -e .[llm]
Now, you have successfully installed the FS-LLM.
Now, we can fine-tune a GPT2 on Alpaca with FedAvg.
python federatedscope/main.py --cfg federatedscope/llm/baseline/testcase.yaml
For more details about customized configurations, see Advanced.
You can easily run through a customized yaml
file. Here we only introduce the configuration related to FS-LLM, other configurations please refer to Configurations. For more examples, please refer to federatedscope/llm/baseline
.
# For this configuration, you might need a GPU with at least 32GB of video memory to run.
# Whether to use GPU
use_gpu: True
# Deciding which GPU to use
device: 0
# Early stop steps, set `0` to disable
early_stop:
patience: 0
# Federate learning related options
federate:
# `standalone` or `distributed`
mode: standalone
# Number of communication round
total_round_num: 500
# Saving path for ckpt
save_to: "llama_rosetta_9_fed.ckpt"
# Number of dataset being split
client_num: 9
# Enable for saving memory, all workers share the same model instance
share_local_model: True
# Dataset related options
data:
# Root directory where the data stored
root: data/
# Dataset name
type: 'rosetta_alpaca@llm'
# Train/val/test splits
splits: [0.89,0.1,0.01]
# Use meta inforamtion to split `rosetta_alpaca`
splitter: 'meta'
# LLM related options
llm:
# Max token length for model input (training)
tok_len: 650
# ChatBot related options
chat:
# Max token length for model input (inference)
max_len: 1000
# Max number of history texts
max_history_len: 10
# Path for store model cache, default in `~/.cache/`
cache:
model: ''
# PEFT related options
adapter:
# Set ture to enable PEFT fine-tuning
use: True
# Args for PEFT fine-tuning
args: [ { 'adapter_package': 'peft', 'adapter_method': 'lora', 'r': 8, 'lora_alpha': 32, 'lora_dropout': 0.1 } ]
# DataLoader related options
dataloader:
# Batch size for iter loader
batch_size: 1
# Model related options
model:
# Model type (format: {MODEL_REPO}@huggingface_llm)
type: 'decapoda-research/llama-7b-hf@huggingface_llm'
# Train related options
train:
# Number of local update steps
local_update_steps: 30
# `batch` or `epoch` for local_update_steps
batch_or_epoch: batch
# Optimizer related options
optimizer:
# Learning rate
lr: 0.003
# Weight decay
weight_decay: 0.0
# Set ture to enable `model.half()`
is_enable_half: True
# Trainer related options
trainer:
# Trainer type
type: llmtrainer
# Evaluation related options
eval:
# Frequency of evaluation
freq: 50
# Evaluation metrics
metrics: ['loss']
# Set key to track best model
best_res_update_round_wise_key: val_loss
In general, we use instruction SFT following Alpaca team. And in standalone mode, all dataset can be split into several clients with spesific splitter
(i.e., lda
, meta
, iid
) and federate.num_client
.
data.type | Source | Note |
---|---|---|
alpaca@llm |
Link | IIDSplitter |
alpaca_cleaned@llm |
Link | IIDSplitter |
dolly-15k@llm |
Link | LDASplitter or MetaSplitter split to 8 clients. |
gsm8k@llm |
Link | IIDSplitter |
rosetta_alpaca@llm |
Link | LDASplitter or MetaSplitter split to 9 clients. |
code_search_net@llm |
Link | LDASplitter or MetaSplitter split to 6 clients. |
data.type | Note |
---|---|
YOU_DATA_NAME.json@llm |
Format: [{'instruction': ..., 'input': ..., 'output':...}] , default key: instruction , input , output , category |
YOU_DATA_NAME.jsonl@llm |
Format of each line: {'instruction': ..., 'input': ..., 'output':...} , default key: instruction , input , output , category |
We evaluate model domain capability of fine-tuned models with easy-to-use evaluation tools.
FederatedScope
βββ federatedscope
β βββ llm
β β βββ eval
β β β βββ eval_for_code
β β β βββ eval_for_gsm8k
β β β βββ eval_for_helm
β β β βββ eval_for_mmlu
...
How to use:
For example, to evaluate the model fine-tuned with python federatedscope/main.py --cfg sft_gsm8k.yaml
, you can run python federatedscope/llm/eval/eval_for_gsm8k/eval.py --cfg sft_gsm8k.yaml
in the eval_for_gsm8k
directory. For other usages, please refer to the README.md
file in each subdirectory.
With the help of parameter-efficient fine-tuning methods, federally fine-tuning a large model requires passing only a very small percentage of model parameters (adapters), making it possible for the client enable efficient adaptation of pre-trained language models to various downstream applications. We adopt PEFT for fine-tuning LLMs, and more methods are coming soon!
Methods | Source | Example for llm.adapter.args |
---|---|---|
LoRA | Link | [ { 'adapter_package': 'peft', 'adapter_method': 'lora', 'r': 8, 'lora_alpha': 32, 'lora_dropout': 0.1 } ] |
Prefix Tuning | Link, Link | [{'adapter_package': 'peft', 'adapter_method': 'prefix', 'prefix_projection': False, 'num_virtual_tokens': 20}] |
P-Tuning | Link | [{'adapter_package': 'peft', 'adapter_method': 'p-tuning', 'encoder_reparameterization_type': 'MLP', 'encoder_dropout': 0.1, 'num_virtual_tokens': 20}] |
Prompt Tuning | Link | [{'adapter_package': 'peft', 'adapter_method': 'prompt', 'prompt_tuning_init': 'RANDOM', 'num_virtual_tokens': 20}] |
We support federated fine-tuning not only for open-source LLMs, but also for closed-source LLMs. In this scenario, clients can fine-tune LLMs without fully accessing the model, where models and data are both considered as privacy.
Methods | Source | How to enable | Note |
---|---|---|---|
Offsite-Tuning | Link | llm.offsite_tuning.use=True |
- |
For example, the following methods are supported:
Methods | Source | How to use | Note |
---|---|---|---|
Drop layers | Link | llm.offsite_tuning.emu_l=2 llm.offsite_tuning.emu_r=30 llm.offsite_tuning.kwargs={"drop_ratio":0.2}} |
The server fixes the first two layers and the layers after 30th layer as the adapter, and uniformly drops 20% of the remaining layers, denoted as the emulator |
Model distill | Link | llm.offsite_tuning.emu_align.use=True llm.offsite_tuning.emu_l=2 llm.offsite_tuning.emu_r=30 |
The server fixes the first two layers and the layers after 30th layer as the adapter, and regards the remaining as the teacher model, and distills a student model as the emulator |
More methods will be supported ASAP.
To evaluate fine-tuned closed-source LLMs, one should decide whether to evaluate the original model with fine-tuned adapters or the emulator with fine-tuned adapters.
Methods | Source | How to use | note |
---|---|---|---|
Evaluation of fine-tuned closed-source LLMs | Link | cfg.llm.offsite_tuning.eval_type='full' (or 'emu' ) |
'full' means evaluating the original model with fine-tuned adapters; 'emu' means evaluating the emulator with fine-tuned adapters |
To make the federated fine-tuning efficient, we adopt a series of acceleration operators.
Methods | Source | How to use | Note |
---|---|---|---|
torch.nn.DataParallel | Link | cfg.train.data_para_dids=[0,1] |
It splits the input across the specified devices by chunking in the batch dimension. |
DeepSpeed | Link | cfg.llm.accelation.use=True |
Use nvcc - V to make sure CUDA installed. When set it to True , we can full-parameter fine-tune a llama-7b on a machine with 4 V100-32G gpus. |
FP16 | Link | train.is_enable_half=True |
Converting float types to half-precision to save memory usage |
Share local model | - | federate.share_local_model=True |
The clients will share the base model, which reduces a lot of cpu memory consumption. |
Move to cpu | - | llm.adapter.mv_to_cpu=True |
Move adapter to cpu after training, which can save memory but cost more time. |
WARNING: Skip the batch due to the loss is NaN, it may be caused by exceeding the precision or invalid labels.
- Possible reason 1: This is because
llm.tok_len
limits the input length, causing the label to be empty, which automatically skips that data. Setting a largerllm.tok_len
can avoid this. - Possible reason 2: Due to the enabling of
train.is_enable_half
, numerical overflow may occur. This usually happens when setting theoptimizer.type
toAdam
, since the defaulteps
is1e-8
butfp16
requires at least1e-5
.
- Possible reason 1: This is because
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
- This is a problem with
transformers
, you can fix it in your local file. ReplaceLLaMATokenizer
withLlamaTokenizer
inPATH_TO_DATA_ROOT/MODEL_REPO/snapshots/..../tokenizer_config.json
- This is a problem with
OutOfMemoryError: CUDA out of memory.
- Torch's garbage collection mechanism may not be timely resulting in OOM, please set
cfg.eval.count_flops
toFalse
.
- Torch's garbage collection mechanism may not be timely resulting in OOM, please set
If you find FedBiOT useful for your research or development, please cite the following papers:
@inproceedings{wu2024fedbiot,
author = {Wu, Feijie and Li, Zitao and Li, Yaliang and Ding, Bolin and Gao, Jing},
title = {FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model},
year = {2024},
booktitle = {Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {3345β3355}
}
@inproceedings{kuang2024federatedscope,
author = {Kuang, Weirui and Qian, Bingchen and Li, Zitao and Chen, Daoyuan and Gao, Dawei and Pan, Xuchen and Xie, Yuexiang and Li, Yaliang and Ding, Bolin and Zhou, Jingren},
title = {FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning},
year = {2024},
booktitle = {Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {5260β5271}
}