You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder
My own task or dataset (give details below)
Reproduction
I noticed that the DPO trainer uses the processing_class to tokenize inputs to both model and ref_model. Is there a way to allow for a different ref_model base class that does not share the same tokenizer config with the model? For example using a Llama-3.1-8b model to align a Llama-3.2-3b model - training with this configuration leads to a constant loss=1.0 at the moment.
Expected behavior
The Trainer must take two processing classes and allow for a different ref_model and model class
The text was updated successfully, but these errors were encountered:
System Info
Information
Tasks
examples
folderReproduction
I noticed that the DPO trainer uses the processing_class to tokenize inputs to both
model
andref_model
. Is there a way to allow for a differentref_model
base class that does not share the same tokenizer config with themodel
? For example using a Llama-3.1-8b model to align a Llama-3.2-3b model - training with this configuration leads to a constantloss=1.0
at the moment.Expected behavior
The Trainer must take two processing classes and allow for a different ref_model and model class
The text was updated successfully, but these errors were encountered: