请教DPO多轮对话的问题 #293

chloefresh · 2023-12-26T07:40:51Z

尝试把多轮对话数据格式做成下面的格式用DPO代码跑了一下lora，merge之后，发现推理速度变慢，而且推理会输出重复的内容。
代码部分只把"prompt": ["Question: " + question + "\n\nAnswer: " for question in examples["question"]]改成了"prompt": examples["question"],是不是还需要和多轮对话sft一样每轮对话结束后加结束符？

{"question": "\n\nHuman:你好\n\nAssistant:你好\n\nHuman:你好\n\nAssistant:", "response_chosen": "您好", "response_rejected": "您好，有什么可以帮您的吗"}

使用的参数是：
CUDA_VISIBLE_DEVICES=4,5,6 python dpo_training.py
--model_type baichuan
--model_name_or_path 经过sft的base模型
--train_file_dir ./reward
--validation_file_dir ./reward
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--do_train
--do_eval
--use_peft True
--max_train_samples -1
--max_eval_samples -1
--max_steps 100
--eval_steps 20
--save_steps 50
--max_source_length 1024
--max_target_length 256
--output_dir outputs-dpo-v1
--target_modules all
--lora_rank 8
--lora_alpha 16
--lora_dropout 0.05
--torch_dtype float16
--fp16 True
--device_map auto
--report_to tensorboard
--remove_unused_columns False
--gradient_checkpointing True
--cache_dir ./cache
--gradient_accumulation_steps 4

shibing624 · 2023-12-26T10:50:00Z

可以手动加结束符。

chloefresh · 2023-12-28T04:01:08Z

@shibing624 dpo训练完了后推理速度变慢了不少，请问可能是什么原因呢？

shibing624 · 2023-12-28T07:01:20Z

我没有感觉特别明显的区别。

chloefresh added the question Further information is requested label Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请教DPO多轮对话的问题 #293

请教DPO多轮对话的问题 #293

chloefresh commented Dec 26, 2023

shibing624 commented Dec 26, 2023

chloefresh commented Dec 28, 2023

shibing624 commented Dec 28, 2023

请教DPO多轮对话的问题 #293

请教DPO多轮对话的问题 #293

Comments

chloefresh commented Dec 26, 2023

shibing624 commented Dec 26, 2023

chloefresh commented Dec 28, 2023

shibing624 commented Dec 28, 2023