Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_to_unit a unyt_array may cause problem in previously created slices. #552

Open
haoenz opened this issue Dec 22, 2024 · 4 comments
Open

Comments

@haoenz
Copy link

haoenz commented Dec 22, 2024

  • unyt version: 3.0.3
  • Python version: 3.12.5
  • Operating System: Debian 6.1.119

Description

If I have a unyt_array and have created some slices from it, modifying the original array with convert_to_unit will cause those slices to produce incorrect results, and vice versa.

From a design perspective, I'm not sure whether them should update accordingly or remain unchanged, but the current behavior is definitely problematic.

What I Did

from unyt import unyt_array as YTArray

arr = YTArray([1,2,3,4],"m")
arr_slice = arr[:2]
print(arr) # [1,2,3,4] m
print(arr_slice) # [1,2] m

arr[0] = 5
print(arr) # [5,2,3,4] m
print(arr_slice) # [5,2] m

arr.convert_to_units("cm") 
print(arr) # [500,200,300,400] cm
print(arr_slice) # [random,numbers] m

arr = YTArray([1,2,3,4],"m")
arr_slice = arr[:2]

arr_slice[0] = 5
print(arr_slice) # [5,2] m
print(arr) # [5,2,3,4] m

arr_slice.convert_to_units("cm") 
print(arr_slice) # [500,200] cm
print(arr) # [random,numbers,3,4] m
@neutrinoceros
Copy link
Member

Thanks for reporting this !

From a design perspective, I'm not sure whether them should update accordingly or remain unchanged

I think they would be expected to follow the change: array slices are views into the original array's memory so any mutation on the array should be reflected in views.

but the current behavior is definitely problematic.

most definitely.

Would you like to try and fix this yourself ?

@neutrinoceros neutrinoceros added the bug Something isn't working label Dec 22, 2024
@neutrinoceros
Copy link
Member

Actually, this is another case of "mixing units with integral datatypes doesn't work".
Everything works correctly if the original array uses a floating point datatype (for instance using arr = YTArray([1. , 2. ,3. , 4.],"m")).
Let me try to explain the situation:
convert_to_units cannot retain an integral datatype and preserve accuracy at the same time. The design that was chosen is to favor accuracy, that is, the array will always be converted to a floating point datatype in place. The datatypes of existing views however are not updated, though that is very likely an intentional design choice from numpy. Let's illustrate the same problem with pure numpy

import numpy as np

def convert_to_float_inplace(arr):
    new_dtype = f"f{arr.dtype.itemsize}"
    if arr.dtype.kind in 'ui':
        # copy original array
        new_values = arr.astype(new_dtype)
        arr.dtype = new_dtype
        np.copyto(arr, new_values)

arr = np.arange(5)
arr_slice = arr[1:3]
convert_to_float_inplace(arr)
print(arr)
print(arr_slice)

outputs

[0 1 2 3 4]
[1 2]
[0. 1. 2. 3. 4.]
[4607182418800017408 4611686018427387904]

the values as printed on the last line are not random, rather, they are the result of interpreting floating point memory as integers.
So, the issue you're having is in fact not a bug, but the result of (perhaps disputable) design choices from both numpy and unyt, and I don't think this can be fixed. However there's a clear way out of it: if you need in-place conversions (as one often does), avoid integral datatypes in unyt_array instances.

@neutrinoceros neutrinoceros removed the bug Something isn't working label Dec 23, 2024
@haoenz
Copy link
Author

haoenz commented Dec 23, 2024

So, the issue you're having is in fact not a bug, but the result of (perhaps disputable) design choices from both numpy and unyt.

I see now that the issue I raised is more of a NumPy issue than a unyt issue.

However, I apologize for falling into an XY problem. The original reason that led me to bring up this issue is as follows:

from unyt import unyt_array as YTArray

arr = YTArray([1.,2.,3.,4.,5.], "m")
arr2 = arr[:2]
arr.convert_to_units("cm")
arr2  
# unyt_array([100., 200.], 'm')
# The numerical values in the slice changed, but the units remained the same.

arr = YTArray([1.,2.,3.,4.,5.], "m")
arr2 = arr[:2]
arr2.convert_to_units("cm")
arr
# unyt_array([100., 200.,   3.,   4.,   5.], 'm')
# The units remained, and only the numbers within the slice were affected.

Of course, I understand that this behavior is entirely inherited from np.ndarray. However, considering that YTArray includes units, I believe this behavior could unintentionally lead to some confusing results.

Personally, I would expect that if a YTArray is modified with convert_to_units, all YTArray instances sharing the same value should update accordingly.

I'm not sure if this expectation is reasonable or how challenging it would be to implement. However, even if it's not feasible, wouldn't it be better to raise warnings for such operations?

@neutrinoceros
Copy link
Member

neutrinoceros commented Dec 24, 2024

I apologize for falling into an XY problem.

no need to, that's a very easy mistake to make !

Personally, I would expect that if a YTArray is modified with convert_to_units, all YTArray instances sharing the same value should update accordingly.

This is likely impossible in the general case: an array doesn't hold references to its views so it cannot propagate the modification. Although views are detectable at runtime, they only have one reference to their base array, and couldn't possibly propagate modification beyond that (admitting that it is desirable in the first place).

I'm not sure if this expectation is reasonable or how challenging it would be to implement. However, even if it's not feasible, wouldn't it be better to raise warnings for such operations?

One thing that might be possible is to raise a a warning or an exception when a view is modified in place (views are detectable at runtime). However that would need to be done consistently throughout the whole API (that may be a substantial amount of work), and raising an exception would be a breaking change, so I think we'd need to go with a warning.

I would happily review a pull request implementing such a warning, but I will not invest my own free time building it.
To me, the root problem is that ndarray objects (and their subclasses) are delicate to handle because they are mutable (and they should be ! an immutable version would be almost entirely useless), making in-place operations possible but inherently not memory-safe. unyt implements a couple such in-place operations that are clearly highlighted as such, and in general have memory-safe, copy-returning counterparts (convert_to_units has to).

In short, I would recommend avoiding in-place operations if you can afford it, or being extra careful around them if you can't, but I'm not convinced that we can do much about it on our side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants