Numpy dotnet Performance Issue #56

Sundarrajan06295 · 2023-09-26T09:55:18Z

I tried to multiple 2 large array with numpydotnet library
Code : np.multiply(Data1, Data2) ->300ms
I clearly see there is a performance degradation when compared to numpy library ->150ms

KevinBaselinesw · 2023-09-26T10:35:41Z

Are the data types the same? For example are they both doubles? Same type data will perform much better.

Could you provide a more complete example. How big are the arrays? What is the data type? What is the shape of the arrays that you are testing?

Sundarrajan06295 · 2023-09-27T10:30:49Z

For same data type the difference is minimum.
When I do quantile it is taking more than 150 to 200ms
Code : np.quantile(array1,array2)
When I do variance of image size(3815 * 2800) after transpose it is taking 250ms
code : np.var(data.Transpose(),axis:0)

KevinBaselinesw · 2023-09-27T10:52:38Z

One big difference between C# and C (python numpy is a C library) is that C allows easier/faster casting between data types. In C# if you try to cast Int32 to UInt32 I think it will throw an exception but C will allow it. This forces NumpyDotNet to follow a code path that ultimately uses the "dynamic" data type to allow different data types to be used together. It works great, but it is a quite a bit slower. That is why carefully using the same data types will allow the library to run much faster. I can follow templated code paths that don't use the dynamic data type. This also applies to constant values. Something like doubleArray + 1 should be written doubleArray + 1.0 to get maximum performance.

I am working on another issue that can cause slower performance. I use try/catch around most of the calculations. This allows me to catch calculation errors that throw exceptions (i.e. divide by zero, overflows, etc...) and set a default value instead which is what python/numpy does. However, C# try/catch does add a significant CPU overhead. If that is in the middle of 1 million calculations, that can add up to a lot of time. I am working on adding a feature to disable/reroute code to not use try/catch. If you are confident your application will not cause an exception (99% probably don't) then it can speed up the calculations by about 20%.

Would you be willing to demo this feature in your code?

KevinBaselinesw · 2023-09-27T10:53:10Z

I will look into the np.quantile and np.var performance issues too.

Sundarrajan06295 · 2023-09-28T03:43:16Z

We have following interesting observations:

Variance of axis 1 is always throwing error ?
Data size : (3815 * 2800)
Code : var variance = np.var(data,axis:1)
Exception :
Unhandled exception. System.Exception: shape mismatch: objects cannot be broadcast to a single shape
at NumpyLib.numpyinternal.GenerateBroadcastedDims(NpyArray leftArray, NpyArray rightArray)
at NumpyLib.numpyinternal.NpyArray_NumericOpArraySelection(NpyArray srcArray, NpyArray operandArray, UFuncOperation operationType)
at NumpyLib.numpyinternal.NpyArray_PerformNumericOperation(UFuncOperation operationType, NpyArray x1Array, NpyArray x2Array, NpyArray outArray, NpyArray whereFilter)
at NumpyLib.numpyAPI.NpyArray_PerformNumericOperation(UFuncOperation operationType, NpyArray x1Array, NpyArray x2Array, NpyArray outArray, NpyArray whereFilter)
at NumpyDotNet.NpyCoreApi.PerformNumericOp(ndarray a, UFuncOperation ops, ndarray b, Boolean UseSrcAsDest)
at NumpyDotNet.ndarray.op_Subtraction(ndarray a, ndarray b)
at NumpyDotNet.np.var(Object a, Nullable`1 axis, dtype dtype, Int32 ddof, Boolean keep_dims)
To circumvent, we are transposing an array and doing variance calculation.
However, we observed with array of the size (3815 * 2800) ,
np.var(data.Transpose(), axis: 0) takes around 250-300 milliseconds compared to np.var(data, axis:0) which takes ~60 ms.

Sundarrajan06295 · 2023-09-28T09:47:25Z

When I try to debug numpydotnet file , I found the error is in this line
"(object)(ndarray1.astype(np.Float32) - ndarray2)"

KevinBaselinesw · 2023-09-28T11:06:25Z

I have a bug fix coming for this today.

Sundarrajan06295 · 2023-09-28T11:12:22Z

okay , how can i use it

KevinBaselinesw · 2023-09-28T11:20:13Z

If you give me an email address at kmckenna at baselinesw.com, I can send you a new DLL that should work for you.

KevinBaselinesw · 2023-09-28T17:01:04Z

I have researched why np.quantile takes much longer than python version does. The root cause is that np.quantile ultimately calls np.partition code to do the heavy work. This code is much slower in C# than in the python C code. The reason is the python code uses a lot of complex C macros to do the work which effectively inlines all of the processing. C# does not support macros so I had to turn the macros into functions. These functions are called very frequently which greatly adds to the processing overhead. I can't think of any way to speed this up while still making the code readable and debuggable.

KevinBaselinesw · 2023-09-29T12:21:43Z

I have research why np.var takes much longer than python does. The root cause is that np.var is really comprised of 7 (at least) math operations on the array. If each one takes a little bit longer than the python/C version does, it adds up to a significant difference. I have made a few small tweeks to the code to make it a little faster.

Quansight-Labs deleted a comment from Sundarrajan06295 Sep 29, 2023

GregTheDev mentioned this issue Jan 10, 2025

Performance problem calling np.where #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numpy dotnet Performance Issue #56

Numpy dotnet Performance Issue #56

Sundarrajan06295 commented Sep 26, 2023

KevinBaselinesw commented Sep 26, 2023 •

edited

Loading

Sundarrajan06295 commented Sep 27, 2023

KevinBaselinesw commented Sep 27, 2023

KevinBaselinesw commented Sep 27, 2023

Sundarrajan06295 commented Sep 28, 2023 •

edited

Loading

Sundarrajan06295 commented Sep 28, 2023

KevinBaselinesw commented Sep 28, 2023

Sundarrajan06295 commented Sep 28, 2023

KevinBaselinesw commented Sep 28, 2023

KevinBaselinesw commented Sep 28, 2023

KevinBaselinesw commented Sep 29, 2023

Numpy dotnet Performance Issue #56

Numpy dotnet Performance Issue #56

Comments

Sundarrajan06295 commented Sep 26, 2023

KevinBaselinesw commented Sep 26, 2023 • edited Loading

Sundarrajan06295 commented Sep 27, 2023

KevinBaselinesw commented Sep 27, 2023

KevinBaselinesw commented Sep 27, 2023

Sundarrajan06295 commented Sep 28, 2023 • edited Loading

Sundarrajan06295 commented Sep 28, 2023

KevinBaselinesw commented Sep 28, 2023

Sundarrajan06295 commented Sep 28, 2023

KevinBaselinesw commented Sep 28, 2023

KevinBaselinesw commented Sep 28, 2023

KevinBaselinesw commented Sep 29, 2023

KevinBaselinesw commented Sep 26, 2023 •

edited

Loading

Sundarrajan06295 commented Sep 28, 2023 •

edited

Loading