[fix](be) Return NaN for avg_weighted when sum of weights is zero#64333
[fix](be) Return NaN for avg_weighted when sum of weights is zero#64333jacktengg wants to merge 1 commit into
Conversation
Issue Number: close #xxx Problem Summary: When the sum of weights passed to avg_weighted is zero but the weighted data sum is non-zero, the function computed data_sum / weight_sum and returned +/-Infinity (e.g. avg_weighted(f1, f2) over rows (1, 1), (2, -1) returned -Infinity). The expected mathematical result of a division by zero weight is undefined, so it should return NaN. This fixes AggregateFunctionAvgWeightedData::get() to return quiet_NaN() when weight_sum is exactly zero, instead of producing Infinity.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
Review result: no blocking code issues found.
Critical checkpoint conclusions:
- Goal and proof: the PR changes avg_weighted finalization so a non-empty aggregate state with accumulated weight_sum == 0 returns quiet NaN instead of +/-Infinity. The added regression case covers weights 1 and -1 and expects NaN.
- Scope: the implementation is small and focused: one include, one final-result branch, and one regression case.
- Concurrency and lifecycle: not applicable. This only reads aggregate state during result insertion; no shared state, locks, static initialization dependency, or lifecycle ownership is introduced.
- Config and compatibility: no config, storage format, function signature, serialized aggregate state, or FE/BE protocol change. Existing serialized state remains data_sum + weight_sum.
- Parallel paths: I checked the registration/combinator path. avg_weighted and foreach wrappers use this same nested implementation, so I did not find another result path that still returns infinity for the same zero-weight state.
- Conditions and error handling: the new weight_sum == 0 condition has a concrete semantic trigger and does not suppress errors or ignore Status values.
- Test coverage and result correctness: the added regression follows the existing suite style, drops the table before use, uses a hardcoded table name, and the expected NaN spelling matches existing floating-point output conventions. git diff --check is clean.
- Observability and performance: no new observability is needed for this finalization-only behavior. The extra branch is outside accumulation/merge loops and should be negligible.
- Transactions, persistence, data writes, and FE-BE variables: not applicable.
- User focus: no additional user-provided focus points were present.
I did not run a local Doris regression cluster. Current CI has pending BE/performance checks; the macOS BE UT failure I inspected stops before build with "ERROR: The JAVA version is 25, it must be JDK-17", which appears unrelated to this change.
Minor process note: the PR body still says Release note: None even though this is a user-visible bug fix. I am not blocking on that, but it may be worth aligning before merge.
TPC-H: Total hot run time: 29181 ms |
TPC-DS: Total hot run time: 169314 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Issue Number: close #xxx
Problem Summary: When the sum of weights passed to avg_weighted is zero but the weighted data sum is non-zero, the function computed data_sum / weight_sum and returned +/-Infinity (e.g. avg_weighted(f1, f2) over rows (1, 1), (2, -1) returned -Infinity). The expected mathematical result of a division by zero weight is undefined, so it should return NaN.
This fixes AggregateFunctionAvgWeightedData::get() to return quiet_NaN() when weight_sum is exactly zero, instead of producing Infinity.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)