Skip to content

Add selection algorithms nthElement heapSelect and median #24414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

oittaa
Copy link
Contributor

@oittaa oittaa commented Jul 11, 2025

Selection algorithms come up from time to time, for example here: #9890

This pull request adds the following public functions to std.select. Is this the correct place or should we put them to std.sort even though they aren't exactly sorting algorithms?

  • nthElement
  • nthElementContext
  • heapSelect
  • heapSelectContext
  • median

I tried to check how the current sorting algorithm work and more or less copy their behavior, but there are still couple of open questions.

  • How to handle empty arrays? Currently we use assert.
  • Is the argument order correct for nthElement and heapSelect, especially n?
  • Are the arguments for median good enough or what should be done to them?
  • There are two private functions that are exactly the same in std.sort or std.sort.pdq: partitionEqual and siftDown should we share the implementations somehow?
  • And couple of functions are almost the same, but since we don't care about the exact order here we drop some functionality: sort3, chosePivot, partition Same as above, do we implement them here with the minor modifications (and possible minor speedups), but duplicate code?
  • Are there enough tests?

@adworacz
Copy link

Any thoughts on additional inclusion of sorting networks, particularly for finding things like median?

Ref:

  1. https://github.com/bertdobbelaere/SorterHunter
  2. https://bertdobbelaere.github.io/sorting_networks.html
  3. https://bertdobbelaere.github.io/median_networks.html

Disclaimer: I implemented some of the above in comptime for my video processing plugin: https://github.com/adworacz/zsmooth/blob/master/src/common/sorting_networks.zig

@alichraghi
Copy link
Contributor

  • How to handle empty arrays? Currently we use assert.
    Sounds about right. It's better to catch invalid usage at the call site to prevent UB.

  • Is the argument order correct for nthElement and heapSelect, especially n?
    LGTM.

  • Are the arguments for median good enough or what should be done to them?
    LGTM.

  • There are two private functions that are exactly the same in std.sort or std.sort.pdq: partitionEqual and siftDown should we share the implementations somehow?
    Yes.

  • And couple of functions are almost the same, but since we don't care about the exact order here we drop some functionality: sort3, chosePivot, partition Same as above, do we implement them here with the minor modifications (and possible minor speedups), but duplicate code?
    I think they should be public and the duplicated code must be removed.

@oittaa
Copy link
Contributor Author

oittaa commented Jul 12, 2025

  • I think they should be public and the duplicated code must be removed.

EDIT:
I put them to std.sort. If there are better ideas, they're welcomed. At first I put them under std.sort.utils but I noticed that for example std.crypto.utils will be removed in the future.

@oittaa
Copy link
Contributor Author

oittaa commented Jul 14, 2025

Since I didn't find anything from the math library that calculates a mean in an overflow safe manner, in the median function for integers it's now done with the common bit manipulation (a&b) + ((a^b)>>1). Mean for two floats is calculated a/2 + b/2 where we might lose some precision since I have no idea how the compiler will optimize that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants