Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding orpo training #1210

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

Goekdeniz-Guelmez
Copy link
Contributor

Training:

python -m mlx_lm.lora \
    --model mlx-community/Josiefied-Qwen2.5-0.5B-Instruct-abliterated-v1-4bit \
    --train \
    --data /Users/gokdenizgulmez/Desktop/dpo_test_data \
    --iters 10 \
    --batch-size 1 \
    --num-layers 1 \
    --val-batches 2 \
    --steps-per-report 1 \
    --adapter-path /Users/gokdenizgulmez/Desktop/test-dpo \
    --max-seq-length 1024 \
    --grad-checkpoint \
    --training-mode orpo \
    --fine-tune-type lora \
    --dpo-loss-type sigmoid \
    --beta 0.1 \
    --steps-per-eval 50

Output:

Loading pretrained model
Fetching 9 files: 100%|███████████████████████████████████████| 9/9 [00:00<00:00, 113701.01it/s]
Loading datasets
Training in orpo mode
Trainable parameters: 0.109% (0.541M/494.033M)
Starting ORPO training..., iters: 10
Iter 1: Val loss 3.107, Val chosen reward 2.000, Val rejected reward 0.000, Val took 0.518s
Iter 1: Train loss 2.197, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.375, Tokens/sec 743.712, Trained Tokens 541.0, Peak mem 1.284 GB
Iter 2: Train loss 8.681, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.348, Tokens/sec 773.850, Trained Tokens 1115.0, Peak mem 1.347 GB
Iter 3: Train loss 0.378, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.501, Tokens/sec 797.219, Trained Tokens 1646.0, Peak mem 1.347 GB
Iter 4: Train loss 0.006, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.539, Tokens/sec 807.946, Trained Tokens 2171.0, Peak mem 1.347 GB
Iter 5: Train loss 0.005, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.516, Tokens/sec 796.148, Trained Tokens 2696.0, Peak mem 1.347 GB
Iter 6: Train loss 0.442, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.384, Tokens/sec 748.986, Trained Tokens 3237.0, Peak mem 1.347 GB
Iter 7: Train loss 0.145, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.514, Tokens/sec 804.046, Trained Tokens 3768.0, Peak mem 1.347 GB
Iter 8: Train loss 5.233, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.362, Tokens/sec 781.894, Trained Tokens 4342.0, Peak mem 1.347 GB
Iter 9: Train loss 4.444, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.321, Tokens/sec 758.359, Trained Tokens 4916.0, Peak mem 1.347 GB
Iter 10: Val loss 2.200, Val chosen reward 2.000, Val rejected reward 0.000, Val took 0.467s
Iter 10: Train loss 0.002, Chosen reward 1.000, Rejected reward 0.000, Learning Rate 1.000e-05, It/sec 1.521, Tokens/sec 798.309, Trained Tokens 5441.0, Peak mem 1.347 GB
Saved final weights to /Users/gokdenizgulmez/Desktop/test-dpo/adapters.safetensors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant