Skip to content

eryl/aida-fl-workshop

Repository files navigation

AIDA Technical Days Federated Learning Example

This repository contain code to train an XLMRoberta model using masked language modelling and LoRA fine tuning. The code is based on the NLP-NER example, incorporating code from the Huggingface run_mlm.py scripts.

As example datasets, the code uses works of Jane Austen and Shakespeare.

Installation

The suggested installation creates a virtual environment which you will use to run you federated learning client. All code run in the federated environment must be pre-installed on the system, the federation will not allow arbitrary code to be executed on the nodes.

For convenience, the code for the experiment can be installed as a "development" package. This means that any changes to the code will automatically be reflected in the environment package (e.g. by doing a git pull, you don't have to remember to reinstall the package).

Start by cloning this repo:

$ git clone [email protected]:eryl/aimplant.git
$ cd aimplant

Now you can install either using pip (you must have python 3.10 installed system-wide) or uv (manages python version for you). uv is the recommended method.

With pip

If you're using pip, you need python 3.10 installed on the system (later versions of python might cause issues with package dependencies). If you have trouble installing python 3.10 system-wide, it is suggested that you use the uv install method below.

Use pip to install the dependencies:

$ python3.10 -m venv .venv
$ source .venv/bin/activate
(federatedhealth)$ python -m pip install -U pip #Upgrade pip
(federatedhealth)$ python -m pip install -e .   # This installs this code

With Astral uv (recommended)

Astral uv is a fast and capable python packaging tool. It conveniently installs full python environments for you, including different versions of python. Install it following this guide.

Once uv is installed and added to your path you can run the following in the project directory:

$ uv sync  # creates .venv using the correct python environment
$ source .venv/bin/activate
(federatedhealth)$ uv pip install -e .

XLM-RoBERTa

You will also need the model from huggingface.

Download the model by using the Huggingface CLI (should have been installed with the environment):

$ hf download FacebookAI/xlm-roberta-base --local-dir models/xlm-roberta-base

Dataset

The datasets are expected to be regular text files (UTF-8 encoded) with the training examples. In the aiMPLANT demonstrator, the files is organized with one line per patient, with the clinical notes for each patient concatenated sequentially according to date of the note. Do note that sequences of text will follow new-lines, so the context window for MLM will not include text spanning multiple lines.

Configuration

Configuration is based on a json configuration file. The default file can be found in src/federatedhealth/default_config.json. The experiment will look for this file in $HOME/.federatedhealth/config.json, and if not found will copy the default config there. You can do this manually by running:

$ cp src/federatedhealth/default_config.json $HOME/.federatedhealth/config.json

The config file ($HOME/.federatedhealth/config.json) in will look something like this:

{
    "model_path": "/path/to/xlmroberta-dir",
    "data_config": {
        "training_data": "/path/to/training_data.txt",
        "dev_data": "/path/to/dev_data.txt",
        "test_data": "/path/to/test_data.txt"
    },
    "training_args": {
        "mlm_probability": 0.1,
        "optimization_batch_size": 32,
        "per_device_train_batch_size": 4,
        "per_device_eval_batch_size": 4,
        "learning_rate": 1e-4,
        "weight_decay": 1e-3,
        "max_train_steps": null,
        "num_train_epochs": 10,
        "lr_scheduler_type": "linear",
        "num_warmup_steps": 0,
        "checkpointing_steps": null,
        "aggregation_epochs": 1
    },
    "lora_config": {
        "task_type": "TOKEN_CLS", 
        "inference_mode": false, 
        "r": 8, 
        "lora_alpha": 8, 
        "lora_dropout": 0.1,
        "bias": "all"
    }
}

You need to change these values:

  • "model_path": Point this to the directory you extracted the XLM-RoBERTa model to
  • "training_data": This should be the full path to your training text file
  • "dev_data": This should be the full path to your development text file
  • "test_data": This should be the full path to your test text file

Configure local batch size

Due to differences in compute capacity, you might want to override the device batch size (number of samples which gradients are computed on at a time). You can change the configuration values:

  • "training_args.per_device_train_batch_size"
  • "training_args.per_device_eval_batch_size"

Local training with sample data

First make sure you have the config installed:

$ cp src/federatedhealth/default_config.json $HOME/.federatedhealth/config.json

Assuming the model is in models/xlmroberta, we add this to the config file:

$ sed -i "s#\(\"model_path\": *\"\)[^\"]*\"#\1$PWD/models/xlm-roberta\"#" $HOME/.federatedhealth/config.json

And we can set the training data paths in the same way:

$ sed -i "s#\(\"training_data\": *\"\)[^\"]*\"#\1$PWD/fedhealth_mlm_data/site-1_train.txt\"#" $HOME/.federatedhealth/config.json
$ sed -i "s#\(\"dev_data\": *\"\)[^\"]*\"#\1$PWD/fedhealth_mlm_data/site-1_dev.txt\"#" $HOME/.federatedhealth/config.json
$ sed -i "s#\(\"test_data\": *\"\)[^\"]*\"#\1$PWD/fedhealth_mlm_data/site-1_test.txt\"#" $HOME/.federatedhealth/config.json

This makes sure the config file has valid entries for model and datasets. You can now run the local training:

(federatedhealth)$ python local_train.py

Federated Simulation

To simulate the federated task, run the nvflare simulator with:

(federatedhealth)$ nvflare simulator federatedhealth_mlm_job -w /tmp/nvflare/workspaces/xlmroberta-mlm -n 2 -t 1 -gpu 0 

This starts the flare job in the folder federatedhealth_mlm_job with 2 clients (-n 2) running in 1 thread (-t 1) using GPU with id 0 (-gpu 0). By using a single thread, the simulator will run the tasks for the two clients in sequence, thereby enabling it to share a single GPU.

Federated training

Before you start, you should have downloaded the client starter pack from the projects nvidia flare dashboard. This

About

Repository for the AIDA Data Hub Federated Learning workshop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published