Skip to content

Conversation

@flutist
Copy link

@flutist flutist commented Jan 4, 2026

What does this PR do?

Fixes # (issue)
make dpo compatible with qwen3vl

Before submitting

  • [yes ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ yes] Did you read the contributor guideline,
    Pull Request section?
  • [no ] Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • [yes ] Did you make sure to update the documentation with your changes?
  • [ no] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@qgallouedec
Copy link
Member

Thanks, can you add a test case using trl-internal-testing/tiny-Qwen3VLForConditionalGeneration in TestDPOVisionTrainer to ensure it works properly?

@flutist
Copy link
Author

flutist commented Jan 6, 2026

Thanks, can you add a test case using trl-internal-testing/tiny-Qwen3VLForConditionalGeneration in TestDPOVisionTrainer to ensure it works properly?

Of course, let me implement one.

@flutist
Copy link
Author

flutist commented Jan 8, 2026

Thanks, can you add a test case using trl-internal-testing/tiny-Qwen3VLForConditionalGeneration in TestDPOVisionTrainer to ensure it works properly?

I have finished the work, and all the test is passed.
image

@flutist
Copy link
Author

flutist commented Jan 9, 2026

@qgallouedec could you help to do a code review for this request?

@flutist
Copy link
Author

flutist commented Jan 9, 2026

image One more thing: these three default values ​​are all quite small. During actual training, it's easy to overlook setting these values, which can lead to errors. Do they need to be increased? @qgallouedec

@qgallouedec
Copy link
Member

these three default values ​​are all quite small. During actual training, it's easy to overlook setting these values, which can lead to errors. Do they need to be increased? @qgallouedec

max_prompt_length and max_completion_length should be removed as part of #3906

@flutist
Copy link
Author

flutist commented Jan 10, 2026

these three default values ​​are all quite small. During actual training, it's easy to overlook setting these values, which can lead to errors. Do they need to be increased? @qgallouedec

max_prompt_length and max_completion_length should be removed as part of #3906

got that, thanks

@flutist
Copy link
Author

flutist commented Jan 12, 2026

@qgallouedec
Are there any codes that need further modification?
I need your feedback.
Thanks much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants