Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

Adds the XPO (Exploratory Preference Optimization) paper entry to the paper index under the Online Direct Preference Optimization section.

Relates to #4407

Note on hyperparameters: The XPO paper (arXiv 2405.21046) defines α > 0 (optimism coefficient) and β > 0 (KL regularization) in Algorithm 1 but does not specify numerical values in the paper — the experimental details are not publicly accessible, and the paper authors did not release a standalone codebase. The configuration uses TRL defaults (alpha=1e-5, beta=0.1) and this is clearly noted in the entry.

- Add Exploratory Preference Optimization entry under Online DPO section
- Note that paper defines α > 0 and β > 0 but does not specify numerical values
- Config uses TRL defaults (alpha=1e-5, beta=0.1)
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sergiopaniego sergiopaniego mentioned this pull request Feb 11, 2026
55 tasks
Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@sergiopaniego sergiopaniego merged commit e46005c into main Feb 11, 2026
3 checks passed
@sergiopaniego sergiopaniego deleted the docs/add-paper-2405.21046-xpo branch February 11, 2026 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants