We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
尝试把多轮对话数据格式做成下面的格式用DPO代码跑了一下lora,merge之后,发现推理速度变慢,而且推理会输出重复的内容。 代码部分只把"prompt": ["Question: " + question + "\n\nAnswer: " for question in examples["question"]]改成了"prompt": examples["question"],是不是还需要和多轮对话sft一样每轮对话结束后加结束符?
{"question": "\n\nHuman:你好\n\nAssistant:你好\n\nHuman:你好\n\nAssistant:", "response_chosen": "您好", "response_rejected": "您好,有什么可以帮您的吗"}
使用的参数是: CUDA_VISIBLE_DEVICES=4,5,6 python dpo_training.py --model_type baichuan --model_name_or_path 经过sft的base模型 --train_file_dir ./reward --validation_file_dir ./reward --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train --do_eval --use_peft True --max_train_samples -1 --max_eval_samples -1 --max_steps 100 --eval_steps 20 --save_steps 50 --max_source_length 1024 --max_target_length 256 --output_dir outputs-dpo-v1 --target_modules all --lora_rank 8 --lora_alpha 16 --lora_dropout 0.05 --torch_dtype float16 --fp16 True --device_map auto --report_to tensorboard --remove_unused_columns False --gradient_checkpointing True --cache_dir ./cache --gradient_accumulation_steps 4
The text was updated successfully, but these errors were encountered:
可以手动加结束符。
Sorry, something went wrong.
@shibing624 dpo训练完了后推理速度变慢了不少,请问可能是什么原因呢?
我没有感觉特别明显的区别。
No branches or pull requests
尝试把多轮对话数据格式做成下面的格式用DPO代码跑了一下lora,merge之后,发现推理速度变慢,而且推理会输出重复的内容。
代码部分只把"prompt": ["Question: " + question + "\n\nAnswer: " for question in examples["question"]]改成了"prompt": examples["question"],是不是还需要和多轮对话sft一样每轮对话结束后加结束符?
{"question": "\n\nHuman:你好\n\nAssistant:你好\n\nHuman:你好\n\nAssistant:", "response_chosen": "您好", "response_rejected": "您好,有什么可以帮您的吗"}
使用的参数是:
CUDA_VISIBLE_DEVICES=4,5,6 python dpo_training.py
--model_type baichuan
--model_name_or_path 经过sft的base模型
--train_file_dir ./reward
--validation_file_dir ./reward
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--do_train
--do_eval
--use_peft True
--max_train_samples -1
--max_eval_samples -1
--max_steps 100
--eval_steps 20
--save_steps 50
--max_source_length 1024
--max_target_length 256
--output_dir outputs-dpo-v1
--target_modules all
--lora_rank 8
--lora_alpha 16
--lora_dropout 0.05
--torch_dtype float16
--fp16 True
--device_map auto
--report_to tensorboard
--remove_unused_columns False
--gradient_checkpointing True
--cache_dir ./cache
--gradient_accumulation_steps 4
The text was updated successfully, but these errors were encountered: