Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the return type of Enumerable#sum for union elements #15308

Closed
wants to merge 0 commits into from

Conversation

rvprasad
Copy link

@rvprasad rvprasad commented Dec 22, 2024

Fixes #15269

Using the first type of a union type as the type of the result of Enumerable#sum() call can cause runtime failures, e.g. [1, 10000000000_u64].sum will result in an OverflowError. A safer alternative is to flag/disallows the use of union types with Enumerable#sum() and suggest the use of Enumerable#sum(initial) with an initial value of the expected type of the sum call.

@Blacksmoke16 Blacksmoke16 linked an issue Dec 22, 2024 that may be closed by this pull request
@BlobCodes
Copy link
Contributor

While checking this PR, I noticed that the Enumerable#sum method generally seems to have issues.

The method has an "optimized path" for a sum of strings, but it wouldn't even compile without that optimization because it would otherwise try to call .zero on String.


The docs should be updated to mention that the method won't work with specific types.

Also, I don't think the best way of removing #sum(Union) is just to raise an error in a minor release without any warnings beforehand.
There's the .warning macro function instead, for example.

@rvprasad
Copy link
Author

David, thanks for the input!

I agree the doc should be updated. How about updating the document about .additive_identity as follows (being consistent with other bits of the document)?

This method calls .additive_identity on the yielded type to determine the type of the sum call. Hence, it can fail to compile if .additive_identity fails to determine a safe type, e.g., in case of union types. In such cases, use sum(initial) with an initial value of the expected type of the sum call.

As for warning vs raise, based on the simplicity of the fix (i.e., sum() to sum(initial)) and assuming there are very few uses of this feature in the wild, I doubt this change will cause a lot of toil/churn. That said, I am fine with gradually introducing this change by using warning instead of raise.

@straight-shoota
Copy link
Member

Let's discuss further improving the documentation of Enumerable#sum in a separate issue.

With regards to warning vs. error, I think we usually prefer the latter because warnings tend to be overlooked anyways.

The justification for such a breaking change would that the current behaviour is quite unexpected and can lead to weird subtle bugs. E.g. the behaviour may change depending on the addition or removal of types from the union, even when the actual values are identical:

([12345678_u32] of UInt8 | UInt32).sum  # => 12345678
([12345678_u32] of UInt16 | UInt32).sum # OverflowError: Arithmethc overflow

So I believe it could be acceptable to release this change in a minor release. It might even still go into 1.15, though we should take some time to check if there are any significant consequences in the ecosystem that we may not be aware of.

@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. breaking-change topic:stdlib:collection labels Dec 23, 2024
Comment on lines 2296 to 2298
raise("Enumerable#sum() does support Union types. Instead, " +
"use Enumerable#sum(initial) with an initial value of " +
"the expected type of the sum call.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Reflect#first is used for Enumerable#product as well as #sum (this is mentioned in the issue). So the error message should reflect that. And we should have a spec for #product.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Done :)

it { [1, 3_u64].sum(0_i32).should eq(4_u32) }
it { [1, 3].sum(0_u64).should eq(4_u64) }
it { [1, 10000000000_u64].sum(0_u64).should eq(10000000001) }
it "raises if union types are summed" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specs that take a long time (e.g. run the compiler) should be marked as slow:

Suggested change
it "raises if union types are summed" do
it "raises if union types are summed", tags: %w[slow] do

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@oprypin oprypin changed the title Fixes #15269 Fix the return type of Enumerable#sum for union elements Dec 24, 2024
@rvprasad rvprasad closed this Dec 26, 2024
@rvprasad
Copy link
Author

I messed up my local repo while syncing it with recent upstream changes. When I tried to clean it up, I incidentally closed this PR :-/

I have addressed all of the comments in this PR thread and created a new PR; #15314. Please review it.

Apologies for any inconvenience.

@Blacksmoke16
Copy link
Member

For future reference, try to avoid force pushing as per https://github.com/crystal-lang/crystal/blob/master/CONTRIBUTING.md#making-good-pull-requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return type of Enumerable#sum and #product for union elements
4 participants