Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed and tidied combinatorics.list_permutation function #9

Merged
merged 3 commits into from
Nov 27, 2023

Conversation

DanielSherlock
Copy link
Contributor

Hi, hope this is okay.


The problem

I noticed an error in the implementation of the combinatorics.list_permutation function, when provided with lists containing duplicate items. For example (printing each permutation on a new line):

// (Old behaviour)
// [1, 1, 2] |> list_permutation 
[1, 2]
[1, 2]
[2, 1]
[2, 1]

Clearly, none of those are permutations of [1, 1, 2]! It did work correctly when given lists with unique elements:

// [1, 1, 2] |> list.unique |> list_permutation 
[1, 2]
[2, 1]

The issue was the use of filter on line 357. This removed from the remaining list all duplicates of x, when it shouldn't have.


New behaviour

When fixing this, I considered two reasonable behaviours for list_permutations:

  • A: Treat duplicate elements as distinct for the purposes of permutations. For example, the list [1, 1] has two permutations... [1, 1] (the same way round) and [1, 1] (the other way round). With this behaviour all lists of the same length will have the same number of permutations, namely factorial(list.length(whichever_list)).
  • B: Treat duplicate elements as identical for the purposes of permutations. In this case, the list [1, 1] has only one permutation... itself. With this behaviour lists with duplicate elements will have fewer permutations than other lists of the same length.

Both behaviours are plausibly useful. I have decided, for this pull request, to implement behaviour A. Partly because it was a simpler change to write, but also because I think it is slightly more versatile: if you need behaviour B it is as simple as calling the function with behaviour A, and then piping the result to list.unique:

// (New behaviour)
// [1, 1, 2] |> list_permutation
[1, 1, 2]
[1, 2, 1]
[1, 1, 2]
[1, 2, 1]
[2, 1, 1]
[2, 1, 1]

// (Simulating behaviour B)
// [1, 1, 2] |> list_permutation |> list.unique
[1, 1, 2]
[1, 2, 1]
[2, 1, 1]

To implement this new behaviour, I switched from filter to pop. This always removes exactly one instance of x, leaving its duplicates in the remainder of the list. One difficulty is that pop returns a result in case the element it's searching for isn't in the list. But it always will be (as x is drawn from the list in the line above), so we can be sure to get an Ok(...) and I use a let assert expression to destructure this.

This is not exactly the most efficient way to write this function, but it has the same performance as what was there before, and is fairly readable. I would be happy to write a more efficient version if needed.


Other changes

I made a few other code changes at the same time. These do not have any effect on the behaviour of the function.

  • I noticed we were using local definitions of flat_map and concat for the implementation of list_permutation (and nothing else). As these are standard library functions (in the list module, which we already import) we don't need to duplicate them locally, so I got rid of them.
  • I rewrote the call to flat_map on line 354 to use use. Personally, I find this more readable.

To do

I have not yet:

  • Added the above examples (or similar) as tests.
  • Updated the documentation to explain the behaviour with duplicates.

@hayleigh-dot-dev hayleigh-dot-dev marked this pull request as draft October 25, 2023 10:06
@hayleigh-dot-dev
Copy link
Member

Hey Daniel thanks for this! I've converted the PR to draft, please can you address the todos (tests and docs).

I'll defer to @NicklasXYZ on whether this change is sensible but it seems reasonable to me. We could also consider making the behaviour configurable with a second parameter:

combinatorics.list_permutation([1, 1, 2], ignore_duplicates: False)
// [1, 1, 2]
// [1, 2, 1]
// [1, 1, 2]
// [1, 2, 1]
// [2, 1, 1]
// [2, 1, 1]

combinatorics.list_permutation([1, 1, 2], ignore_duplicates: True)
// [1, 1, 2]
// [1, 2, 1]
// [2, 1, 1]

This might make the implementation a bit clunkier but it would definitely be more performant than suggesting folks iterate the list again with list.unique. I don't feel super strongly about this - maybe we could wait and see if/when Gleam gets optional args.

This is not exactly the most efficient way to write this function, but it has the same performance as what was there before, and is fairly readable. I would be happy to write a more efficient version if needed.

If a more efficient version is fairly simple to implement I think we should favour it over readability: perf wins for number-crunching stuff is likely to be quite high impact!

@hayleigh-dot-dev
Copy link
Member

To implement this new behaviour, I switched from filter to pop. This always removes exactly one instance of x, leaving its duplicates in the remainder of the list. One difficulty is that pop returns a result in case the element it's searching for isn't in the list. But it always will be (as x is drawn from the list in the line above), so we can be sure to get an Ok(...) and I use a let assert expression to destructure this.

Might be good to just leave this as a comment above the assertion too so folks in the future know this assertion is safe!

@NicklasXYZ
Copy link
Member

I am sorry for keeping you waiting, I will get to this very soon 😅

@NicklasXYZ
Copy link
Member

Thank you @DanielSherlock, thank you for fixing this and @hayleigh-dot-dev for already having a look.

I think behavior A that you are describing is a reasonable default choice to go with, but in the future it could be cool to have both behaviors implemented as @hayleigh-dot-dev also points out.

If you could update the tests and documentation I'd be happy to approve this.

With respect to the performance of the implementations, of course if things can be improved in that department as well, I would not turn down such improvements ;)

@DanielSherlock
Copy link
Contributor Author

DanielSherlock commented Nov 14, 2023

No worries, I'm not in any rush (and I hope you aren't too)!

Actually, I thought after posting this pull request: Oh before I start writing tests and docs I should go and look at the standard (or standard-ish) libraries of other languages to see how they handle this case. Then I got distracted. And now it is now.

As far as I can tell, it's like this:

Language Behaviour A Behaviour B Iterator
Javascript (Deno) permutations
Ruby permutation
Scala permutations
Swift permutations uniquePermutations ✅ (Sequence)
Julia permutations multiset_permutations
Rust permutations
Nim nextPermutation ✅ (mutating)
C++ next_permutation ✅ (mutating)
Haskell permutations ✅ (everything is lazy)
Python permutations
Clojure permutations ✅ (lazy-seq)

(I had hoped to find a list-permuting function in the standard (or standard-ish) libraries of Erlang or Elixir, and prioritise similarity to that. But no luck, not even Elixir's Math library has this. Other languages that seem to have nothing (semi-)official include Java, Elm, and Lobster. Though perhaps I wasn't looking hard enough.)

Personally, I like the way Swift handles it. Plus, their documentation is clear (with examples) and also has some hints for possible efficient implementations. The Clojure implementation seem nice and clean too, and might be easy to port. Not all other documentation was similarly clear...

Another thing to note: Almost all of the languages I surveyed returned an iterator (or equivalent thing in that language), even when given a list/array/etc. If we are going to be worried about performance, we might consider doing the same. This after all exactly the sort of function where iterators make a lot of sense:

  • It does not take a particularly large input list for the output list to be very very large indeed.
  • The state needed to generate the next permutation is small.

@hayleigh-dot-dev
Copy link
Member

I think we could discuss returning an iterator in a separate PR. It'd be a breaking change and just generally beyond the scope of what's already here. Better to merge incremental improvements to the library than spend a lot of back-and-forth making a single PR cover everything all at once!

@NicklasXYZ
Copy link
Member

Thanks a lot for the Table Daniel. It's a really nice overview and you are right. Returning an iterator would be the way to go for sure :)
I think what @hayleigh-dot-dev suggests sounds very sensible. It would be a larger change and thus probably more suitable to discuss in another PR.

@DanielSherlock
Copy link
Contributor Author

Very practical suggestions, thanks.

I've added tests and docs and an explanatory comment to the current implementation.

  • I ran the tests and it seems to work fine.
  • I built the docs locally and nothing looks wrong with them.
  • I ran gleam format but it didn't do anything, so I hope that means it's all fine.

When running this, I do get loads of warnings - it seems from Gleam's recent type import syntax change. I've not gone and fixed all of those in this PR.

I'm happy to create separate issues for:

  • Another function implementing behaviour B.
  • A perhaps slightly more efficient implementation.
  • The possibility of returning an iterator.

@NicklasXYZ NicklasXYZ marked this pull request as ready for review November 27, 2023 10:36
@NicklasXYZ
Copy link
Member

Looks good to me. I'd be happy, if you could add the other issues :)!

@NicklasXYZ NicklasXYZ merged commit ad11b96 into gleam-community:main Nov 27, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants