Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define missing and undef constructors according to their semantics #40

Open
knuesel opened this issue Oct 10, 2020 · 2 comments
Open

Define missing and undef constructors according to their semantics #40

knuesel opened this issue Oct 10, 2020 · 2 comments

Comments

@knuesel
Copy link

knuesel commented Oct 10, 2020

SentinelArrays.jl uses undef constructors to initialize the array to "missing" values. I propose to do the following instead:

  1. Use missing constructors to initialize arrays to "missing" (sentinel) values.
  2. Use undef constructors to skip initialization (or more precisely, do whatever the undef constructor does for the underlying array)

Background: undef constructors of Base arrays have been (ab)used to initialize arrays with missing values. This relies on undocumented behavior: the undef constructor does zero-initialization of the memory for union arrays, to avoid invalid element types. And due to implementation details, arrays of Union{Missing,...} usually end up with all elements initialized to missing.

Relying on undocumented behavior is not great, so an attempt was made to document it: JuliaLang/julia#31091. It turned out that this use of undef works almost always, but can fail with unions of Missing and other singleton types.

The issue was solved by addingmissing constructors for Base arrays: JuliaLang/julia#25054 .

Arguments:

  • Using undef to initialize to a particular value makes no sense from a semantics point of view. It doesn't feel like a good API when "define to xxx" is made by calling "leave undefined"!

    Please note the problem is not in the behavior of Base constructors: a "leave undefined" constructor can return anything, so why not all-missing values. But users shouldn't rely on it (at least not when there is a better way).

  • Using undef for this purpose in SentinelArrays will encourage people to do the same with Base arrays, where it can introduce subtle bugs (since it works almost always, but not always).

  • It leaves performance on the table. In Base, zero-initialization is necessary to guarantee valid union types. This constraint doesn't apply to SentinelArrays! And the whole point of SentinelArrays is to give better performance for particular use cases, so I think SentinelArrays should define undef constructor that do what it says on the tin: leave values uninitialized.

Relation to Base:

A downside of this proposal is that SentinelVector{Float64}(undef, 3) would not longer behave the same as Array{Union{Missing,Float64}}(undef, 3). But the latter is undefined behavior. Is agreeing on this undefined behavior more important than having an API that makes sense and offering the best performance?

Another way to look at it:

  1. SentinelArrays should promote patterns that also work with Base arrays, so it should not promote using undef to get missing.
  2. Base does the semantically correct thing: undef for uninitialized (where possible), and missing for initialized-to-missing. In particular, Base does not document the values returned by undef constructors. I think SentinelArrays should do the same.
@quinnj
Copy link
Member

quinnj commented Apr 14, 2021

I apologize for such a slow response here; I agree with everything written up here and I agree that we should properly define and encourage the missing constructors. @knuesel, do you have interest/ability/time to make a PR? Otherwise, I can get to it in the next little while.

@knuesel
Copy link
Author

knuesel commented Apr 15, 2021

No worries! I'm a bit short on time right now so feel free to go ahead...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants