low performence in evaluation with sft distilled model #487

Jiehon · 2025-03-07T02:22:41Z

I'm new llmer。I try to reproduce sft distilled model with open-r1 tool, but evalution shows low performence：

training config:
`model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2

Data training arguments

dataset_name: open-r1/OpenR1-Math-220k
dataset_num_proc: 48

SFT trainer config

bf16: true
do_eval: false
eval_strategy: 'no'
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: Qwen2.5-1.5B-Open-R1-Distill
hub_strategy: every_save
learning_rate: 5.0e-05
log_level: info
logging_steps: 5
logging_strategy: steps
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs:
min_lr_rate: 0.1
packing: true
max_length: 16384
max_steps: -1
num_train_epochs: 2
output_dir: data/Qwen2.5-1.5B-Open-R1-Distill
overwrite_output_dir: true
per_device_eval_batch_size: 16
per_device_train_batch_size: 16
push_to_hub: false
save_strategy: "steps"
save_steps: 100
save_total_limit: 1
seed: 42
use_liger: true
warmup_ratio: 0.05`

I try to reproduect again with open-r1/OpenR1-Math-220k data，but get same result。So I use this distilled model[https://huggingface.co/lewtun/Qwen2.5-1.5B-Open-R1-Distill]. But I get same result。

lucy9527 · 2025-03-07T09:15:10Z

I'm new llmer。I try to reproduce sft distilled model with open-r1 tool, but evalution shows low performence：

training config: `model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct model_revision: main torch_dtype: bfloat16 attn_implementation: flash_attention_2

Data training arguments

dataset_name: open-r1/OpenR1-Math-220k dataset_num_proc: 48

SFT trainer config

bf16: true do_eval: false eval_strategy: 'no' gradient_accumulation_steps: 1 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false hub_model_id: Qwen2.5-1.5B-Open-R1-Distill hub_strategy: every_save learning_rate: 5.0e-05 log_level: info logging_steps: 5 logging_strategy: steps lr_scheduler_type: cosine_with_min_lr lr_scheduler_kwargs: min_lr_rate: 0.1 packing: true max_length: 16384 max_steps: -1 num_train_epochs: 2 output_dir: data/Qwen2.5-1.5B-Open-R1-Distill overwrite_output_dir: true per_device_eval_batch_size: 16 per_device_train_batch_size: 16 push_to_hub: false save_strategy: "steps" save_steps: 100 save_total_limit: 1 seed: 42 use_liger: true warmup_ratio: 0.05`

I try to reproduect again with open-r1/OpenR1-Math-220k data，but get same result。So I use this distilled model[https://huggingface.co/lewtun/Qwen2.5-1.5B-Open-R1-Distill]. But I get same result。

how to print “zj” and "gold" output during eval？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low performence in evaluation with sft distilled model #487

low performence in evaluation with sft distilled model #487

Jiehon commented Mar 7, 2025

lucy9527 commented Mar 7, 2025

Data training arguments

SFT trainer config

low performence in evaluation with sft distilled model #487

low performence in evaluation with sft distilled model #487

Comments

Jiehon commented Mar 7, 2025

Data training arguments

SFT trainer config

lucy9527 commented Mar 7, 2025

Data training arguments

SFT trainer config