Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem calling np.where #67

Closed
GregTheDev opened this issue Jan 10, 2025 · 10 comments
Closed

Performance problem calling np.where #67

GregTheDev opened this issue Jan 10, 2025 · 10 comments

Comments

@GregTheDev
Copy link

Hi

I've hit what seems to be a bit of a performance problem with the np.where(). Consider the following code:

np.random random = new np.random();

ndarray sampleData = random.rand(new shape(496, 682));
ndarray filter = sampleData > 0.5;

ndarray filteredData = (ndarray) np.where(filter, 0, sampleData);
ndarray filteredData2 = (ndarray)np.where(filter, 0d, sampleData);
ndarray filteredData3 = (ndarray)np.where(filter, sampleData, sampleData);

I thought I might be experiencing something similar to #56 , but all 3 uses of where() above take more than a minute to execute (I've also seen something similar with isfinite() previously).

Am I doing something wrong, or is this expected behavior?

Thanks
Greg

@KevinBaselinesw
Copy link
Collaborator

I will look into this.

@KevinBaselinesw
Copy link
Collaborator

KevinBaselinesw commented Jan 11, 2025

Please update to the latest version 9.86.3 for a version of np.where with much higher performance. To get the big performance, use the same data type for both x and y choices.

Thank you.

@GregTheDev
Copy link
Author

Ah awesome thanks! Works great now.

@KevinBaselinesw
Copy link
Collaborator

I missed the np.isfinite mention before. I improved the performance of that API in 9.86.4 if you want to try it.

Thank you.

@GregTheDev
Copy link
Author

Awesome, thanks. Will try it out tomorrow and let you know.

@GregTheDev
Copy link
Author

Wow. Thanks @KevinBaselinesw , isfinite() is way faster now.

@GregTheDev
Copy link
Author

Sorry to be a pain @KevinBaselinesw , but I've got some weird behavior in np.where() that I can't figure out.

(just some background - I'm working with a 3 dimensional array that contains data that will eventually be rgb data for an image)

Consider this python code (don't worry to much about the data, it's the shapes that are important)

import numpy as np

sampleData = np.random.rand(3, 496, 682)
filter = sampleData[0] > 0.5

filteredData = np.where(filter, sampleData, sampleData)

In this case filter has a shape of (496, 682) and sampleData has a shape of (3, 496, 682). 'filteredData' has a shape of (3, 496, 682) - so it keeps the shape of the x and y arguments.

If I do something similar using the .net version then filteredData has a shape of (496, 682) - the shape of the filter, not sampleData. That's not necessarily a bug, it's just different behavior.

Here's some sample code:

[Test]
public void Where_MaintainsOriginalDimensions()
{
    // This is testing whether different shapes for x & y arguments affect the outcome (answer: they don't)
    np.random random = new np.random();

    ndarray sampleData = random.rand(new shape(3, 496, 682));
    ndarray sampleData2 = random.rand(new shape(3, 496, 682));
    ndarray filter = sampleData > 0.5;

    // scalar vs multi dimensional
    ndarray filteredData = (ndarray)np.where(filter, 0d, sampleData);
    Assert.That(filteredData.shape.iDims.Length, Is.EqualTo(3)); // filter.shape = (3, 496, 682), filteredData.shape = (3, 496, 682)

    // multi dimensional vs multi dimensional
    ndarray filteredData2 = (ndarray)np.where(filter, sampleData, sampleData2);
    Assert.That(filteredData2.shape.iDims.Length, Is.EqualTo(3)); // filter.shape = (3, 496, 682), filteredData2.shape = (3, 496, 682)

    // single dimensional vs multi dimensional (fails - shape of result drops a dimension)
    filter = np.max(sampleData, axis: 0) > 0.5; // shape = 496, 682
    ndarray filteredData3 = (ndarray)np.where(filter, sampleData, sampleData);
    Assert.That(filteredData3.shape.iDims.Length, Is.EqualTo(3)); // filter.shape = (496, 682), filteredData3.shape = (496, 682)
}

Following that I tried to replicate the behavior of numpy by treating each dimension as a separate array i.e. treating each 496*682 layer as it's own ndarray and then performing a np.where() on that. So I ended up with 3 np.where() calls, but each call returned the same data irrespective of the input data (it seemed like it still operating on the full original data). Here's some simplified code:

[Test]
public void Where_DoesNotDuplicateResults()
{
    ndarray sampleData = np.array(new int[] { 1, 2, 3, 4, 5, 6, 7, 8 }).reshape(2, 2, 2);
    ndarray filter = np.array(new bool[] {true, false, true, false }).reshape(2,2);

    // 'split' the layers of sampleData into two seperate arrays of 2*2
    // dimA & dimB reflect expected values (1,2,3,4) & (5,6,7,8)
    ndarray dimA = (ndarray)sampleData[0];
    ndarray dimB = (ndarray)sampleData[1];

    // Use the same filter, but on each seperate array
    // In this case 'b' ends up with the same values as 'a'
    ndarray a = (ndarray)np.where(filter, dimA, dimA);
    ndarray b = (ndarray)np.where(filter, dimB, dimB);

    // Making a copy of each 'layer' returns the expected values
    ndarray dimX = np.copy((ndarray)sampleData[0]);
    ndarray dimY = np.copy((ndarray)sampleData[1]);

    ndarray xx = (ndarray)np.where(filter, dimX, dimX);
    ndarray yy = (ndarray)np.where(filter, dimY, dimY);
}

Hope that all makes sense.

Cheers
Greg

@KevinBaselinesw
Copy link
Collaborator

I am working on it. Probably finish this tonight.

@KevinBaselinesw
Copy link
Collaborator

0.9.86.6 will fix all of your reported bugs (hopefully :)

@GregTheDev
Copy link
Author

Thanks, appreciated. Looks good! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants