You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently training a model using DPO, and I'm adapting the dataset dynamically during training. My current approach looks like this:
trainer=DPOTrainer(
model,
None,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=args.beta,
max_prompt_length=1024,
max_length=1536,
)
foriinrange(repetitions):
train_result=trainer.train()
# Adapt the dataset based on the training resultdataset=get_adapted_dataset(train_result)
withPartialState().local_main_process_first():
# Tokenize the updated datasetprint("Updating the training dataset")
trainer.train_dataset=dataset.map(trainer.tokenize_row, num_proc=None)
Is this the correct way to adapt the dataset during training, or is there a more appropriate approach for this scenario?
The text was updated successfully, but these errors were encountered:
Using an iterable dataset might be more suited. If the way you update the dataset depends on the results, you'll probably need to set a callback as well
Hello,
I am currently training a model using DPO, and I'm adapting the dataset dynamically during training. My current approach looks like this:
Is this the correct way to adapt the dataset during training, or is there a more appropriate approach for this scenario?
The text was updated successfully, but these errors were encountered: