Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教DPO多轮对话的问题 #293

Open
chloefresh opened this issue Dec 26, 2023 · 3 comments
Open

请教DPO多轮对话的问题 #293

chloefresh opened this issue Dec 26, 2023 · 3 comments
Labels
question Further information is requested

Comments

@chloefresh
Copy link

尝试把多轮对话数据格式做成下面的格式用DPO代码跑了一下lora,merge之后,发现推理速度变慢,而且推理会输出重复的内容。
代码部分只把"prompt": ["Question: " + question + "\n\nAnswer: " for question in examples["question"]]改成了"prompt": examples["question"],是不是还需要和多轮对话sft一样每轮对话结束后加结束符?

{"question": "\n\nHuman:你好\n\nAssistant:你好\n\nHuman:你好\n\nAssistant:", "response_chosen": "您好", "response_rejected": "您好,有什么可以帮您的吗"}

使用的参数是:
CUDA_VISIBLE_DEVICES=4,5,6 python dpo_training.py
--model_type baichuan
--model_name_or_path 经过sft的base模型
--train_file_dir ./reward
--validation_file_dir ./reward
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--do_train
--do_eval
--use_peft True
--max_train_samples -1
--max_eval_samples -1
--max_steps 100
--eval_steps 20
--save_steps 50
--max_source_length 1024
--max_target_length 256
--output_dir outputs-dpo-v1
--target_modules all
--lora_rank 8
--lora_alpha 16
--lora_dropout 0.05
--torch_dtype float16
--fp16 True
--device_map auto
--report_to tensorboard
--remove_unused_columns False
--gradient_checkpointing True
--cache_dir ./cache
--gradient_accumulation_steps 4

@chloefresh chloefresh added the question Further information is requested label Dec 26, 2023
@shibing624
Copy link
Owner

可以手动加结束符。

@chloefresh
Copy link
Author

@shibing624 dpo训练完了后推理速度变慢了不少,请问可能是什么原因呢?

@shibing624
Copy link
Owner

我没有感觉特别明显的区别。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants