AIDA Technical Days Federated Learning Example

This repository contain code to train an XLMRoberta model using masked language modelling and LoRA fine tuning. The code is based on the NLP-NER example, incorporating code from the Huggingface run_mlm.py scripts.

As example datasets, the code uses works of Jane Austen and Shakespeare.

Installation

The suggested installation creates a virtual environment which you will use to run you federated learning client. All code run in the federated environment must be pre-installed on the system, the federation will not allow arbitrary code to be executed on the nodes.

For convenience, the code for the experiment can be installed as a "development" package. This means that any changes to the code will automatically be reflected in the environment package (e.g. by doing a git pull, you don't have to remember to reinstall the package).

Start by cloning this repo:

$ git clone [email protected]:eryl/aimplant.git
$ cd aimplant

Now you can install either using pip (you must have python 3.10 installed system-wide) or uv (manages python version for you). uv is the recommended method.

With `pip`

If you're using pip, you need python 3.10 installed on the system (later versions of python might cause issues with package dependencies). If you have trouble installing python 3.10 system-wide, it is suggested that you use the uv install method below.

Use pip to install the dependencies:

$ python3.10 -m venv .venv
$ source .venv/bin/activate
(federatedhealth)$ python -m pip install -U pip #Upgrade pip
(federatedhealth)$ python -m pip install -e .   # This installs this code

With Astral `uv` (recommended)

Astral uv is a fast and capable python packaging tool. It conveniently installs full python environments for you, including different versions of python. Install it following this guide.

Once uv is installed and added to your path you can run the following in the project directory:

$ uv sync  # creates .venv using the correct python environment
$ source .venv/bin/activate
(federatedhealth)$ uv pip install -e .

XLM-RoBERTa

You will also need the model from huggingface.

Download the model by using the Huggingface CLI (should have been installed with the environment):

$ hf download FacebookAI/xlm-roberta-base --local-dir models/xlm-roberta-base

Dataset

The datasets are expected to be regular text files (UTF-8 encoded) with the training examples. In the aiMPLANT demonstrator, the files is organized with one line per patient, with the clinical notes for each patient concatenated sequentially according to date of the note. Do note that sequences of text will follow new-lines, so the context window for MLM will not include text spanning multiple lines.

Configuration

Configuration is based on a json configuration file. The default file can be found in src/federatedhealth/default_config.json. The experiment will look for this file in $HOME/.federatedhealth/config.json, and if not found will copy the default config there. You can do this manually by running:

$ cp src/federatedhealth/default_config.json $HOME/.federatedhealth/config.json

The config file ($HOME/.federatedhealth/config.json) in will look something like this:

{
    "model_path": "/path/to/xlmroberta-dir",
    "data_config": {
        "training_data": "/path/to/training_data.txt",
        "dev_data": "/path/to/dev_data.txt",
        "test_data": "/path/to/test_data.txt"
    },
    "training_args": {
        "mlm_probability": 0.1,
        "optimization_batch_size": 32,
        "per_device_train_batch_size": 4,
        "per_device_eval_batch_size": 4,
        "learning_rate": 1e-4,
        "weight_decay": 1e-3,
        "max_train_steps": null,
        "num_train_epochs": 10,
        "lr_scheduler_type": "linear",
        "num_warmup_steps": 0,
        "checkpointing_steps": null,
        "aggregation_epochs": 1
    },
    "lora_config": {
        "task_type": "TOKEN_CLS", 
        "inference_mode": false, 
        "r": 8, 
        "lora_alpha": 8, 
        "lora_dropout": 0.1,
        "bias": "all"
    }
}

You need to change these values:

"model_path": Point this to the directory you extracted the XLM-RoBERTa model to
"training_data": This should be the full path to your training text file
"dev_data": This should be the full path to your development text file
"test_data": This should be the full path to your test text file

Configure local batch size

Due to differences in compute capacity, you might want to override the device batch size (number of samples which gradients are computed on at a time). You can change the configuration values:

"training_args.per_device_train_batch_size"
"training_args.per_device_eval_batch_size"

Local training with sample data

First make sure you have the config installed:

$ cp src/federatedhealth/default_config.json $HOME/.federatedhealth/config.json

Assuming the model is in models/xlmroberta, we add this to the config file:

$ sed -i "s#\(\"model_path\": *\"\)[^\"]*\"#\1$PWD/models/xlm-roberta\"#" $HOME/.federatedhealth/config.json

And we can set the training data paths in the same way:

$ sed -i "s#\(\"training_data\": *\"\)[^\"]*\"#\1$PWD/fedhealth_mlm_data/site-1_train.txt\"#" $HOME/.federatedhealth/config.json
$ sed -i "s#\(\"dev_data\": *\"\)[^\"]*\"#\1$PWD/fedhealth_mlm_data/site-1_dev.txt\"#" $HOME/.federatedhealth/config.json
$ sed -i "s#\(\"test_data\": *\"\)[^\"]*\"#\1$PWD/fedhealth_mlm_data/site-1_test.txt\"#" $HOME/.federatedhealth/config.json

This makes sure the config file has valid entries for model and datasets. You can now run the local training:

(federatedhealth)$ python local_train.py

Federated Simulation

To simulate the federated task, run the nvflare simulator with:

(federatedhealth)$ nvflare simulator federatedhealth_mlm_job -w /tmp/nvflare/workspaces/xlmroberta-mlm -n 2 -t 1 -gpu 0

This starts the flare job in the folder federatedhealth_mlm_job with 2 clients (-n 2) running in 1 thread (-t 1) using GPU with id 0 (-gpu 0). By using a single thread, the simulator will run the tasks for the two clients in sequence, thereby enabling it to share a single GPU.

Federated training

Before you start, you should have downloaded the client starter pack from the projects nvidia flare dashboard. This

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
federatedhealth_mlm_job		federatedhealth_mlm_job
fedhealth_mlm_data		fedhealth_mlm_data
src/federatedhealth		src/federatedhealth
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
local_train.py		local_train.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_docker.txt		requirements_docker.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AIDA Technical Days Federated Learning Example

Installation

With `pip`

With Astral `uv` (recommended)

XLM-RoBERTa

Dataset

Configuration

Configure local batch size

Local training with sample data

Federated Simulation

Federated training

About

Uh oh!

Releases

Packages

Languages

eryl/aida-fl-workshop

Folders and files

Latest commit

History

Repository files navigation

AIDA Technical Days Federated Learning Example

Installation

With pip

With Astral uv (recommended)

XLM-RoBERTa

Dataset

Configuration

Configure local batch size

Local training with sample data

Federated Simulation

Federated training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

With `pip`

With Astral `uv` (recommended)

Packages