Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

相同dpo数据,相同训练配置和训练参数,在safe-rlhf框架训练完成以后回复正常,在llama-factory训练以后模型重复输出 #6458

Open
1 task done
Xuanwu-Gong opened this issue Dec 27, 2024 · 7 comments
Labels
pending This problem is yet to be addressed

Comments

@Xuanwu-Gong
Copy link

Xuanwu-Gong commented Dec 27, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-5.10.101-1.el8.ssai.x86_64-x86_64-with-glibc2.31
  • Python version: 3.11.10
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100-SXM4-80GB
  • vLLM version: 0.6.5

Reproduction

这个是两个框架下面dpo微调时的参数配置
image

同时对数据token进行了统计,能确保所有数据都没有出现超过截断长度。

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 27, 2024
@hiyouga
Copy link
Owner

hiyouga commented Dec 27, 2024

复现命令?

@Xuanwu-Gong
Copy link
Author

Xuanwu-Gong commented Dec 27, 2024

#!/bin/bash
task_name=xxxx
pretrained_model_path=xxxx
dataset_dir=xxxx
model_output_dir=xxxx
dataset=xxxx
max_len=8196
bsz=1
epoch=2
accum=1

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NCCL_IB_GID_INDEX=3
export NCCL_IB_HCA=^mlx5_0

PATH_ORI=${0%/*}
WORK_PATH=$(echo ${PATH_ORI} | sed -r 's//{2,}///')
WORKDIR=$(echo ${PATH_ORI} | sed -r 's//{2,}///')
cd ${WORK_PATH}

WORKDIR=/LLaMA-Factory

MASTER_PORT=12344
MASTER_IP=""
if [ "${RANK}" == "0" ];then
while [[ "$MASTER_IP" == "" ]]
do
MASTER_IP=ping ${MASTER_ADDR} -c 3 | sed '1{s/[^(]*(//;s/).*//;q}'
sleep 1
done
else
sleep 60
MASTER_IP=getent hosts ${MASTER_ADDR} | awk '{print $1}'
fi
export MASTER_NAME=$MASTER_ADDR
echo WORLD_SIZE=${WORLD_SIZE}
echo RANK=${RANK}

if [ ${WORLD_SIZE} -gt 1 ]
then
submit="python -m deepspeed.launcher.launch --node_rank=${RANK} --world_info=${WORLD_INFO} --master_addr=${MASTER_IP} --master_port=${MASTER_PORT} "

else
submit="deepspeed --num_gpus 8 --master_port=9901 "
fi

set -o pipefail

$submit $WORKDIR/src/train.py
--deepspeed $WORKDIR/examples/deepspeed/ds_z3_offload_config.json
--stage dpo
--model_name_or_path "${pretrained_model_path}"
--do_train
--dataset "${dataset}"
--template qwen
--finetuning_type full
--output_dir "${model_output_dir}/${task_name}"
--overwrite_output_dir True
--overwrite_cache
--per_device_train_batch_size ${bsz}
--gradient_accumulation_steps ${accum}
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 1e-6
--warmup_ratio 0.03
--num_train_epochs ${epoch}
--weight_decay 0.05
--adam_beta2 0.95
--cutoff_len ${max_len}
--dataset_dir "${dataset_dir}"
--plot_loss
--preprocessing_num_workers 16
--bf16
--seed 42
--flash_attn fa2

会不会是dpo的loss计算过程和safe-rlhf有出入

@hiyouga
Copy link
Owner

hiyouga commented Dec 27, 2024

用的是 base model 还是 instruct model?

@Xuanwu-Gong
Copy link
Author

用的是 base model 还是 instruct model?

用的是qwen 的instruct model,看起来--templat qwen 应该保证了训练数据输入和instruct model微调过程的一致性。

@hiyouga
Copy link
Owner

hiyouga commented Dec 27, 2024

有对比过两者 loss 和 logp 曲线吗

@Xuanwu-Gong
Copy link
Author

有对比过两者 loss 和 logp 曲线吗

image
image

image
image

整体loss曲线和reward accuracy曲线都十分接近😭,其中绿色曲线是safe-rlhf,黄色曲线是llama factory

@hiyouga
Copy link
Owner

hiyouga commented Dec 30, 2024

@Xuanwu-Gong 方便看一下推理脚本命令吗?是否用 llamafactory 推理的, eos token 有没有设置对

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants