Windows Easy Finetune LLM

"This is a simple Finetune LLM tutorial using Windows + Torchtune"

Software Configuration

OS : Windows 11

CUDA : 12.4

cuDNN : v8.9.7

python : 3.11.0

pytorch : 2.6.0+cu124

triton-windows : 3.2.0.post12

torchtune : 0.5.0+cu124

LLM : LLaMA-2-7B (Used in this tutorial)

Finetune method : QLoRA (Used in this tutorial)

Hardware Configuration

GPU : NVIDIA GeForce RTX 3060 12GB (important : 12GB of VRAM is the minimum standard.)

RAM : 16GB (Better up)

1. Download and Install

python 3.11.0 + CUDA 12.4 + cuDNN v8.9.7

python 3.11.0

https://www.python.org/downloads/release/python-3110/

CUDA and cuDNN similar install tutorial

https://medium.com/@alenzenx/安裝-cuda12-6-與-cudnn-8-9-7-34f95ef8ce7f

2. Install MSVC for Triton-Windows

Add a Windows System variables (The following path is the default path)

Variable name

CUDA_PATH

Variable value

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4

Download Visual Studio Installer

https://visualstudio.microsoft.com/downloads/

Open Visual Stduio Installer

Install Visual Studio Build Tools 2022 version : 17.13.2

Install MSVC (Select within Visual Studio Build Tools 2022 version)

Click "Modify"

Click "Individual components"

Select

MSVC v143 - VS 2022 C++ x64/x86 build tools(Latest)
Windows 11 SDK(10.0.22621.0)
C++ CMake tools for Windows
MSBuild support for LLVM(clang-cl) toolset

Add the following path to Path under Windows User variables for User (The following path is the default path)

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\bin\Hostx64\x64

3. Create Virtual Environment

(Change "C:\Users\User\Desktop\ourllm" to your project path)

python -m venv C:\Users\User\Desktop\ourllm

C:\Users\User\Desktop\ourllm\Scripts\Activate.ps1

4. Install requirements

pip install -r requirements.txt

Verify CUDA GPU execution

python GPUtest.py

Verify Triton execution

python test_triton.py

5. Download raw LLaMA-2-7B

pip install llama-stack

llama model list --show-all

llama download --source meta --model-id Llama-2-7b

LLaMA-2 verification key (a URL starting with https)

You must apply from the following website

https://www.llama.com/llama-downloads/

Click "Previous language & safety models"

Choose "Llama 2"

(a URL starting with https, similar to this)

https://download.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoiMWh0d3JyeWVxOXE1cWpjMTQ5aDQ2OWx5IiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQubGxhbWFtZXRhLm5ldFwvKiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MTk1ODM1MH19fV19&Signature=nCSq%7ECseY3cvvI5w7THDAAXAvaiqP81ibq5nLCztW1efQmL-f67TvxGrblYUGV5Kg7URAsDxJNp5NFdOVoyOX5E5fpFm1Dzi2xAfsrunyGVnud-uliH8HdHoEwT9Pmin5qSt4slG9v2n4hSw7t-htP4dd5yh69rpf7GJWH02QKc66Axf4%7EoQ1AhFc0cLpSpS3MUMDp7D1m2jEjT98J4Ee3Hj1eH%7EtU0mGytyncEb-W1bNEZt8TdTIDwE8pY2S9sXpzGkbQrHv5A4QvR0fqEcvio47uvVjYqSH7ExCHJP5WeYEuT6lXNFgfn59oe0coyliIseAXLQet7X7Jbh2m64Tw__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=587287740993231

6. Convert the raw LLaMA-2 model into hf format (hf format=huggingface format)

pip install protobuf sentencepiece

python convert_llama_weights_to_hf.py --input_dir "Llama-2-7b" --model_size 7B --output_dir "Llama-2-7b-hf" --llama_version 2

7. Fine-tune LLaMA-2 using Torchtune.

torchtune directory

tune ls

Copy the default QLoRA single-GPU training config file from tune

tune cp llama2/7B_qlora_single_device custom_config.yaml

Write custom_config.yaml

Change output_dir path

output_dir: qlora_output

Change tokenizer path

tokenizer:
  _component_: torchtune.models.llama2.llama2_tokenizer
  path: Llama-2-7b-hf/tokenizer.model
  max_seq_len: null

Change checkpointer path and Change to save only QLoRA weights

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: Llama-2-7b-hf
  checkpoint_files: [
      model-00001-of-00003.safetensors,
      model-00002-of-00003.safetensors,
      model-00003-of-00003.safetensors
  ]
  adapter_checkpoint: null
  recipe_checkpoint: null
  output_dir: ${output_dir}
  model_type: LLAMA2
resume_from_checkpoint: False
save_adapter_weights_only: True

Floating-point format : bf16 -> fp32 (Geforce GPU need)

dtype: fp32

Change batch size

dataset:
  _component_: torchtune.datasets.alpaca_cleaned_dataset
  packed: False  # True increases speed
seed: null
shuffle: True
batch_size: 4

Verify custom_config.yaml

tune validate custom_config.yaml

If you want to train the full finetune instead of a dummy test : proceed to Step 9.

8. Create Dummy Test

Write custom_config.yaml

Change batch size and create dummy test path

dataset:
  _component_: my_dummy_dataset.MyDummyDataset
  data_file: "./dummy_alpaca.json"
  packed: False  # True increases speed
seed: null
shuffle: True
batch_size: 4

Verify custom_config.yaml

tune validate custom_config.yaml

Create dummy_alpaca.json

[
    {
    "instruction": "Dummy Instruction",
    "input": "Dummy Input",
    "output": "Dummy Output"
    }
]

Create my_dummy_dataset.py

import json
from torch.utils.data import Dataset

class MyDummyDataset(Dataset):
    def __init__(self, tokenizer=None, data_file=None, packed=False):
        """
        tokenizer: 由 Torchtune 以位置引數 (positional arg) 傳入
        data_file: YAML 中指定的關鍵字引數
        """
        super().__init__()
        self.tokenizer = tokenizer
        self.packed = packed  # <- 關鍵字參數接收 "packed"

        # 如果 data_file 有指定，就載入資料
        if data_file is not None:
            with open(data_file, "r", encoding="utf-8") as f:
                self.data = json.load(f)
        else:
            # 沒有提供檔案就給個空 list
            self.data = []

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        example = self.data[idx]
        return {
            "instruction": example.get("instruction", ""),
            "input": example.get("input", ""),
            "output": example.get("output", ""),
        }

9. Single GPU Finetune LLM (Train)

tune run lora_finetune_single_device --config custom_config.yaml

10. Inference (QLoRA version)

python test.py

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.gitattributes		.gitattributes
GPUtest.py		GPUtest.py
LICENSE		LICENSE
README.md		README.md
convert_llama_weights_to_hf.py		convert_llama_weights_to_hf.py
custom_config.yaml		custom_config.yaml
dummy_alpaca.json		dummy_alpaca.json
goodjob.jpg		goodjob.jpg
my_dummy_dataset.py		my_dummy_dataset.py
requirements.txt		requirements.txt
test.py		test.py
test_triton.py		test_triton.py

License

alenzenx/WindowsEasyFinetuneLLM

Folders and files

Latest commit

History

Repository files navigation

Windows Easy Finetune LLM

"This is a simple Finetune LLM tutorial using Windows + Torchtune"

Software Configuration

Hardware Configuration

1. Download and Install

python 3.11.0 + CUDA 12.4 + cuDNN v8.9.7

python 3.11.0

CUDA and cuDNN similar install tutorial

2. Install MSVC for Triton-Windows

Add a Windows System variables (The following path is the default path)

Download Visual Studio Installer

Open Visual Stduio Installer

Install Visual Studio Build Tools 2022 version : 17.13.2

Install MSVC (Select within Visual Studio Build Tools 2022 version)

Click "Modify"

Click "Individual components"

Select

Add the following path to Path under Windows User variables for User (The following path is the default path)

3. Create Virtual Environment

(Change "C:\Users\User\Desktop\ourllm" to your project path)

4. Install requirements

Verify CUDA GPU execution

Verify Triton execution

5. Download raw LLaMA-2-7B

LLaMA-2 verification key (a URL starting with https)

You must apply from the following website

Click "Previous language & safety models"

Choose "Llama 2"

6. Convert the raw LLaMA-2 model into hf format (hf format=huggingface format)

7. Fine-tune LLaMA-2 using Torchtune.

torchtune directory

Copy the default QLoRA single-GPU training config file from tune

Write custom_config.yaml

Change output_dir path

Change tokenizer path

Change checkpointer path and Change to save only QLoRA weights

Floating-point format : bf16 -> fp32 (Geforce GPU need)

Change batch size

Verify custom_config.yaml

If you want to train the full finetune instead of a dummy test : proceed to Step 9.

8. Create Dummy Test

Write custom_config.yaml

Change batch size and create dummy test path

Verify custom_config.yaml

Create dummy_alpaca.json

Create my_dummy_dataset.py

9. Single GPU Finetune LLM (Train)

10. Inference (QLoRA version)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages