fix minillm trainer #4743

t1101675 · 2025-12-23T13:06:32Z

What does this PR do?

Fix the convergence issues of MiniLLMTrainer. It can now smoothly optimize the Reverse KL Divergence between the teacher and the student:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

t1101675 · 2025-12-23T13:12:02Z

@qgallouedec I have reduced #4731 to only fixing the convergence of the MiniLLM trainer.

qgallouedec · 2025-12-23T16:24:23Z

trl/trainer/grpo_trainer.py

            if self.args.gradient_accumulation_steps % generate_every != 0 or (
                self.use_vllm and self.vllm_importance_sampling_correction
-            ):
+            ) or self.args.always_track_old_logps:


I'm not sure to understand why you need this? When self.args.gradient_accumulation_steps % generate_every == 0, then old_per_token_logps == per_token_logps, why not just using per_token_logps.detach()?

Nice comments! I have removed always_track_old_logps.

fix minillm trainer

cc4ff37

t1101675 mentioned this pull request Dec 23, 2025

Fix MiniLLM Training #4731

Open

qgallouedec reviewed Dec 23, 2025

View reviewed changes

t1101675 added 3 commits December 24, 2025 02:40

remove always track logps

d25b496

fix student logps in get_rev_kl

bcbc68f

remove unnecessary args

76def60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix minillm trainer #4743

fix minillm trainer #4743

Uh oh!

t1101675 commented Dec 23, 2025

Uh oh!

t1101675 commented Dec 23, 2025

Uh oh!

qgallouedec Dec 23, 2025

Uh oh!

t1101675 Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix minillm trainer #4743

Are you sure you want to change the base?

fix minillm trainer #4743

Uh oh!

Conversation

t1101675 commented Dec 23, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

t1101675 commented Dec 23, 2025

Uh oh!

qgallouedec Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

t1101675 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants