Skip to content

BUG: Require sample weights to sum to less than 1 when replace = True #61582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

microslaw
Copy link

@microslaw microslaw commented Jun 6, 2025

@microslaw microslaw force-pushed the add-df-sample-weight-constraints branch 2 times, most recently from 1281002 to 1d1f742 Compare June 6, 2025 11:15
@mroeschke mroeschke requested a review from rhshadrach June 6, 2025 16:38
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but the condition needs adjusted. Will also need a note in the whatsnew for 3.0 under the Other section.

Comment on lines 153 to 159
if weight_sum > 1 and replace:
raise ValueError(
"Invalid weights: If `replace`=True weights must sum to less than 1"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct condition. See the formula in the OP of the linked issue.

Comment on lines 5818 to 5819
When replace = True will not allow weights that add up to less
than 1, to avoid biased results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When replace = True will not allow weights that add up to less
than 1, to avoid biased results.
When replace = False will not allow ``(n * max(weights) / sum(weights)) > 1``,
to avoid biased results.

@microslaw microslaw force-pushed the add-df-sample-weight-constraints branch 2 times, most recently from 46f7059 to 2bc3618 Compare June 6, 2025 21:17
@microslaw microslaw force-pushed the add-df-sample-weight-constraints branch from 2bc3618 to e5f9a07 Compare June 6, 2025 21:53
@microslaw microslaw marked this pull request as draft June 6, 2025 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.sample weights not required to sum to less than 1
3 participants