Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to take a number of random values from a list #683

Open
sobolevn opened this issue Aug 22, 2024 · 14 comments
Open

Function to take a number of random values from a list #683

sobolevn opened this issue Aug 22, 2024 · 14 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@sobolevn
Copy link
Contributor

I started looking at the state of random in Gleam and I have several ideas.

  1. int.random and float.random do not have seeds. It might be a problem for libraries. For example: if you want a fake-data library for tests: you need to have the same data based on some test seed value. Otherwise, you won't have reproducable failures. https://docs.python.org/3/library/random.html#notes-on-reproducibility
  2. There's no int.random_range(x, y) function, which would generate an int from range >=x, <y. I think that this function is essential
  3. There's no choice(List) function, which in my experience is the second most frequently used random function in Python.

Maybe we should add random module with the needed functions and a nice api for Seed(Int)?

@giacomocavalieri
Copy link
Member

For that there's the prng package https://hex.pm/packages/prng

@inoas
Copy link
Contributor

inoas commented Aug 22, 2024

Maybe prng could have random_range and choice added?

@lpil
Copy link
Member

lpil commented Aug 22, 2024

No plans for 1 and 2 at present. 3 sounds good 👍

@inoas
Copy link
Contributor

inoas commented Aug 22, 2024

hm the gleam-@alias idea reappears for me where we could document and help people find functions when they come from other stdlibs such as pythons, phps and javascripts.

@lpil lpil changed the title Consider making changes to random numbers generation Function to take a random value from a list Aug 22, 2024
@lpil lpil added good first issue Good for newcomers help wanted Extra attention is needed labels Aug 22, 2024
@lpil lpil changed the title Function to take a random value from a list Function to take a number of random values from a list Aug 22, 2024
@Varpie
Copy link

Varpie commented Aug 22, 2024

Isn't this basically list.shuffle |> list.take(n)?

I suppose this can be made more efficient than shuffling the full list, if we have a large list and only want to pick a few random elements, but it could be a good first version.

shuffle runs a fold generating random numbers for each element of a list, then sort it and iterates over all elements to remove the generated random numbers. take runs in linear time.
A possible improvement would be to iterate over only the first n elements after the sort, which would avoid one full iteration of the list.

A possible version without the sort would generate a new random int every time, that is at most the length of the list, then pop the element at that position into an accumulator and repeat until it has n elements (or the list is empty).

@lpil
Copy link
Member

lpil commented Aug 22, 2024

There are algorithms for random sampling from a linked list. I haven't done the research to say how each approach compares.

@apainintheneck
Copy link
Contributor

Reservoir sampling would be the obvious choice but the implementations I'm familiar with use indexing into arrays which wouldn't work here since it would have to work with lists instead. I guess you could use a dictionary instead of an array as the reservoir but there's probably a better approach out there.

@lpil lpil closed this as completed Sep 10, 2024
@lpil lpil reopened this Sep 10, 2024
@lpil
Copy link
Member

lpil commented Sep 10, 2024

Oops, misclick.

How about we copy whatever Elixir or Elm or some other similar language does?

@apainintheneck
Copy link
Contributor

Good point. It's worth taking a look at what other languages do in this area.

The Elixir standard library has the Enum.take_random method which now uses a modified reservoir sampling algorithm for performance reasons (relevant commit). Internally it uses a tuple as the reservoir in place of the traditional fixed length array.

I looked at the Elm and couldn't find any relevant methods in the standard library or packages.

A few Haskell libraries had implemented versions of it which used things like IntMap as the reservoir data structure.

None of this really helps us but it's still interesting.

@ethanthoma
Copy link

Are y'all wanting a multiple sample like python's random.sample(List, Int) or just a single element sample like random.choice(List)?

@lpil
Copy link
Member

lpil commented Nov 20, 2024

We want a function which takes a specified number of values from a list.

@ethanthoma
Copy link

I was looking at impls, it seems like algo L seems optimal (and i could be wrong but seems similar to what elixer does). However, it requires taking the natural log which I couldnt find impl in the stdlib, whats the ideal solution? A slower impl without needing natural log or adding natural log to gleam/float?

@lpil
Copy link
Member

lpil commented Nov 21, 2024

Sounds like a good reason to add that function to me! Unless anyone else has any other suggestions.

It looks like that algorithm wants array mutation at a random index. How would you do that in Gleam given we don't have constant time indexing or array mutation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants