make dpo compatible with qwen3vl #4773

flutist · 2026-01-04T06:04:41Z

What does this PR do?

Fixes # (issue)
make dpo compatible with qwen3vl

Before submitting

[yes ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ yes] Did you read the contributor guideline,
Pull Request section?
[no ] Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
[yes ] Did you make sure to update the documentation with your changes?
[ no] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qgallouedec · 2026-01-05T16:41:54Z

Thanks, can you add a test case using trl-internal-testing/tiny-Qwen3VLForConditionalGeneration in TestDPOVisionTrainer to ensure it works properly?

flutist · 2026-01-06T06:27:53Z

Thanks, can you add a test case using trl-internal-testing/tiny-Qwen3VLForConditionalGeneration in TestDPOVisionTrainer to ensure it works properly?

Of course, let me implement one.

flutist · 2026-01-08T06:03:23Z

Thanks, can you add a test case using trl-internal-testing/tiny-Qwen3VLForConditionalGeneration in TestDPOVisionTrainer to ensure it works properly?

I have finished the work, and all the test is passed.

flutist · 2026-01-09T02:37:49Z

@qgallouedec could you help to do a code review for this request?

flutist · 2026-01-09T10:38:11Z

One more thing: these three default values are all quite small. During actual training, it's easy to overlook setting these values, which can lead to errors. Do they need to be increased? @qgallouedec

qgallouedec · 2026-01-09T15:37:58Z

these three default values are all quite small. During actual training, it's easy to overlook setting these values, which can lead to errors. Do they need to be increased? @qgallouedec

max_prompt_length and max_completion_length should be removed as part of #3906

trl/trainer/dpo_trainer.py

flutist · 2026-01-10T01:39:53Z

these three default values are all quite small. During actual training, it's easy to overlook setting these values, which can lead to errors. Do they need to be increased? @qgallouedec

max_prompt_length and max_completion_length should be removed as part of #3906

got that, thanks

flutist · 2026-01-12T08:48:17Z

@qgallouedec
Are there any codes that need further modification?
I need your feedback.
Thanks much.

…ble_qwen3vl

make dpo compatible with qwen3vl

bd47c10

flutist added 4 commits January 6, 2026 19:39

make dpo compatible with qwen3vl add test

eed1fdf

make dpo compatible with qwen3vl add test

c459064

make dpo compatible with qwen3vl

e3ab2fd

make dpo compatible with qwen3vl

3263ab7

flutist added 2 commits January 8, 2026 18:25

Merge branch 'main' into compatible_qwen3vl

9b5b0cc

Merge branch 'main' into compatible_qwen3vl

91eb058

Merge branch 'main' into compatible_qwen3vl

c9d91c2

qgallouedec reviewed Jan 9, 2026

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

Merge branch 'main' into compatible_qwen3vl

6c882bc

flutist and others added 10 commits January 12, 2026 20:51

Merge branch 'main' into compatible_qwen3vl

b6207fb

Merge branch 'main' into compatible_qwen3vl

2349eeb

Merge branch 'main' into compatible_qwen3vl

0e02eca

Merge branch 'main' into compatible_qwen3vl

e752a0d

make dpo compatible with qwen3vl

fcf3554

Merge remote-tracking branch 'origin/compatible_qwen3vl' into compati…

d9d5126

…ble_qwen3vl

Merge branch 'main' into compatible_qwen3vl

057b59e

make dpo compatible with qwen3vl

df1966e

Merge remote-tracking branch 'origin/compatible_qwen3vl' into compati…

6238225

…ble_qwen3vl

Merge branch 'main' into compatible_qwen3vl

c8999b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make dpo compatible with qwen3vl #4773

make dpo compatible with qwen3vl #4773

flutist commented Jan 4, 2026 •

edited

Loading

Uh oh!

qgallouedec commented Jan 5, 2026

Uh oh!

flutist commented Jan 6, 2026

Uh oh!

flutist commented Jan 8, 2026

Uh oh!

flutist commented Jan 9, 2026

Uh oh!

flutist commented Jan 9, 2026

Uh oh!

qgallouedec commented Jan 9, 2026

Uh oh!

Uh oh!

flutist commented Jan 10, 2026

Uh oh!

flutist commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

make dpo compatible with qwen3vl #4773

Are you sure you want to change the base?

make dpo compatible with qwen3vl #4773

Conversation

flutist commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec commented Jan 5, 2026

Uh oh!

flutist commented Jan 6, 2026

Uh oh!

flutist commented Jan 8, 2026

Uh oh!

flutist commented Jan 9, 2026

Uh oh!

flutist commented Jan 9, 2026

Uh oh!

qgallouedec commented Jan 9, 2026

Uh oh!

Uh oh!

flutist commented Jan 10, 2026

Uh oh!

flutist commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flutist commented Jan 4, 2026 •

edited

Loading