Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

low performence in evaluation with sft distilled model #487

Open
Jiehon opened this issue Mar 7, 2025 · 1 comment
Open

low performence in evaluation with sft distilled model #487

Jiehon opened this issue Mar 7, 2025 · 1 comment

Comments

@Jiehon
Copy link

Jiehon commented Mar 7, 2025

I'm new llmer。I try to reproduce sft distilled model with open-r1 tool, but evalution shows low performence:

Image

training config:
`model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2

Data training arguments

dataset_name: open-r1/OpenR1-Math-220k
dataset_num_proc: 48

SFT trainer config

bf16: true
do_eval: false
eval_strategy: 'no'
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: Qwen2.5-1.5B-Open-R1-Distill
hub_strategy: every_save
learning_rate: 5.0e-05
log_level: info
logging_steps: 5
logging_strategy: steps
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs:
min_lr_rate: 0.1
packing: true
max_length: 16384
max_steps: -1
num_train_epochs: 2
output_dir: data/Qwen2.5-1.5B-Open-R1-Distill
overwrite_output_dir: true
per_device_eval_batch_size: 16
per_device_train_batch_size: 16
push_to_hub: false
save_strategy: "steps"
save_steps: 100
save_total_limit: 1
seed: 42
use_liger: true
warmup_ratio: 0.05`

I try to reproduect again with open-r1/OpenR1-Math-220k data,but get same result。So I use this distilled model[https://huggingface.co/lewtun/Qwen2.5-1.5B-Open-R1-Distill]. But I get same result。

Image

@lucy9527
Copy link

lucy9527 commented Mar 7, 2025

I'm new llmer。I try to reproduce sft distilled model with open-r1 tool, but evalution shows low performence:

Image

training config: `model_name_or_path: Qwen/Qwen2.5-1.5B-Instruct model_revision: main torch_dtype: bfloat16 attn_implementation: flash_attention_2

Data training arguments

dataset_name: open-r1/OpenR1-Math-220k dataset_num_proc: 48

SFT trainer config

bf16: true do_eval: false eval_strategy: 'no' gradient_accumulation_steps: 1 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false hub_model_id: Qwen2.5-1.5B-Open-R1-Distill hub_strategy: every_save learning_rate: 5.0e-05 log_level: info logging_steps: 5 logging_strategy: steps lr_scheduler_type: cosine_with_min_lr lr_scheduler_kwargs: min_lr_rate: 0.1 packing: true max_length: 16384 max_steps: -1 num_train_epochs: 2 output_dir: data/Qwen2.5-1.5B-Open-R1-Distill overwrite_output_dir: true per_device_eval_batch_size: 16 per_device_train_batch_size: 16 push_to_hub: false save_strategy: "steps" save_steps: 100 save_total_limit: 1 seed: 42 use_liger: true warmup_ratio: 0.05`

I try to reproduect again with open-r1/OpenR1-Math-220k data,but get same result。So I use this distilled model[https://huggingface.co/lewtun/Qwen2.5-1.5B-Open-R1-Distill]. But I get same result。

Image

how to print “zj” and "gold" output during eval?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants