Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Iterator#slice and Iterator#each_slice with Iterator#in_slices_of #15251

Open
BigBoyBarney opened this issue Dec 6, 2024 · 16 comments
Open

Comments

@BigBoyBarney
Copy link
Contributor

BigBoyBarney commented Dec 6, 2024

Enumerable has 2 similar methods that return an Array(Array) of the given size, namely:

snorlax = [1, 2, 3, 4]

snorlax.in_groups_of(3) # [[1, 2, 3], [4, nil, nil]]
snorlax.in_slices_of(3) # [[1, 2, 3], [4]]

For Iterator, these methods are not consistent.

snorlax = [1, 2, 3, 4]

iterator = snorlax.each.in_groups_of(3)
iterator.next # [1, 2, 3]
iterator.next # [4, nil, nil]

snorlax.each.in_slices_of(3) # [[1, 2, 3], [4]] => very odd

The actual method(s) that correspond to what #in_slices_of should be are

snorlax = [1, 2, 3, 4]

iterator = snorlax.each.slice(3)
iterator.next # [1, 2, 3]
iterator.next # [4]

iterator = snorlax.each.each_slice(3)
iterator.next # [1, 2, 3]
iterator.next # [4]

I propose deprecating both Iterator#slice and Iterator#each_slice in favour of Iterator#in_slices_of.

I'd gladly implement this (as my first actual PR to Crystal 🥳) if you agree.

@BlobCodes
Copy link
Contributor

BlobCodes commented Dec 6, 2024

Iterator(T) includes Enumerable(T), so I'd actually expect in_slices_of and in_groups_of to both return arrays, since that's what the docs suggest:

module Enumerable(T)
  # Returns an `Array` with chunks in the given size, eventually filled up with given value or nil.
  def in_groups_of(size : Int, filled_up_with : U = nil) forall U
  end

  # Returns an Array with chunks in the given size. Last chunk can be smaller depending on the number of remaining items.
  def in_slices_of(size : Int) : Array(Array(T))
  end
end

snorlax.each.in_groups_of(3) returning an Iterator is a bug - either in documentation or behaviour.


each_slice and each_group already return Iterators and don't conflict on the API.

slice() is just an alias of each_slice, maybe it can be deprecated.

@BigBoyBarney
Copy link
Contributor Author

I fully disagree, since it reads infinitely worse.

array.each.in_groups_of(3) makes it very clear what it's supposed to do.
array.each.each_group(3) does not.

@Blacksmoke16
Copy link
Member

Iterator(T) includes Enumerable(T), so I'd actually expect in_slices_of and in_groups_of to both return arrays, since that's what the docs suggest:

Iterator provides a more specialized in_groups_of implementation that returns an iterator: https://crystal-lang.org/api/Iterator.html#in_groups_of%28size%3AInt%2Cfilled_up_with%3Dnil%2Creuse%3Dfalse%29-instance-method. Whereas #in_slices_of is not, so it's just using the generic Enumerable implementation.

@BlobCodes
Copy link
Contributor

Iterator still includes Enumerable, and Enumerable guarantees that in_slices_of and in_groups_of always return an Array.

Changing this is a breaking change, which shouldn't just be done in a 1.x release.

@Blacksmoke16
Copy link
Member

Ooo right, good point!

@BigBoyBarney
Copy link
Contributor Author

BigBoyBarney commented Dec 6, 2024

Yeah thinking about it more this cannot be implemented before 2.0. It would still be nice to make note of it or add the 2.0 tag or something, because the way it currently works is inconsistent and should eventually be changed.

@BlobCodes
Copy link
Contributor

I think it's unlikely that Iterator won't be Enumerable in 2.0, so we could already:

  • Add a note to Enumerable#in_groups_of that Iterator implements the API wrongly
  • Add Iterator#each_group
  • Maybe deprecate Iterator#slice
  • Maybe add Iterator#group
  • Deprecate Iterator#in_groups_of

Then the API can be fixed in 2.0 without any issues.

@BigBoyBarney
Copy link
Contributor Author

BigBoyBarney commented Dec 6, 2024

I have a few issues with that, mainly that the each syntax is pretty bad. array.each.each_group doesn't read particularly well, and is vastly inferior to in_*something*_of.

Also, I am relatively sure there are other classes that implement a specific version of an inherited method, so I wouldn't say that Enumerable#in_groups_of implements the API wrongly.

@BlobCodes
Copy link
Contributor

I have a few issues with that, mainly that the each syntax is pretty bad. array.each.each_group doesn't read particularly well.

I think #group and #slice would be the most coherent variants since there's also no in_cons_of or with_step, for example.

Looking at other languages, the in_x_of scheme seems unconventional:

  • Kotlin uses #chunked and #windowed (which I like since it keeps me from mixing it up with Slice(T))
  • Python uses itertools.batched and has no equivalent to #slice
  • Scala uses #grouped and has no equivalent to #slice

Also, I am relatively sure there are other classes that implement a specific version of an inherited method, so I wouldn't say that Enumerable#in_groups_of implements the API wrongly.

It does implement it wrongly, because it breaks the API expectations given by the docs.

Just because other classes may also break API expectations doesn't mean it's correct.

@BigBoyBarney
Copy link
Contributor Author

Well based on that line of thought, slice is also bad because the returned object is not, in fact, a Slice, but a new Iterator.

@straight-shoota
Copy link
Member

I believe this issue is essentially an instance of the more general problem described in #12803
Enumerable implements methods in a generic fashion and uses the generic collection type Array as return value. But more specialized overrides in subtypes should ideally return a matching specialized type, thus breaking from the implicit (or in some cases explicit) return type in the generic implementation in Enumerable.

@BigBoyBarney
Copy link
Contributor Author

So you do agree that it's not an incorrect API implementation?
Iterator#in_groups_of returning an Iterator instance makes perfect sense to me, and it reads beautifully in code too.

@straight-shoota
Copy link
Member

I'm actually not sure about #in_groups_of in particular because it's a bit special.
It returns a collection of sub collections, so it's not immediately obvious what type to expect for the collections (and whether to expect it as an iterator or materialized).

Iterator(T)#in_groups_of returns Iterator(Iterator(T)).
Enumerable(T)#in_groups_of returns Array(Array(T)).
Technically, both return types implement Enumerable(Enumerable(T)) if we accept that as the generic signature.
There are however other types that would fit that as well and there might be good use cases for them, for example Array(Iterator(T)): It keeps the outer collection fixed to Array (as in Enumerable) but retains the original type in the type of the sub collection.

@BigBoyBarney
Copy link
Contributor Author

BigBoyBarney commented Dec 7, 2024

I'm not sure how that would be used in this case. Returning an array would make the following impossible:

snorlax = [1, 2, 3, 4]

iterator = snorlax.each.in_groups_of(3)
iterator.next # [1, 2, 3]
iterator.next # [4, nil, nil]

One would need to resort to calling #each on the result, which feels very odd.

@straight-shoota
Copy link
Member

straight-shoota commented Dec 9, 2024

Well of course regardless of which variant we use, every single one has the downside of not having the other possible variants available in the same way.

I'm merely saying that the fact that the original collection type is an iterator should not necessarily imply all derived collection types must be iterators as well. Expecting other types can be quite reasonable.
That's just an observation which may or may not help derive a plan for the future.

I don't mean that Iterator#in_groups_of must return Array. Just that it could be considered a feasible option.

@BigBoyBarney
Copy link
Contributor Author

BigBoyBarney commented Dec 9, 2024

Oh yeah, totally. I wasn't arguing against that at all.

My main point is that currently how Iterator#in_groups_of works is beautiful and should be kept if possible. Then, if Iterator#in_groups_of exists, Iterator#in_slices_of should return an Iterator instance as well, to properly match the Enumerable#in_groups_of, Enumerable#in_slices_of pairing's logical train of thought.

But even in a case where Iterator#in_groups_of is changed, Iterator#in_slices_of should be changed accordingly, because the currently dichotomous implementation is unintuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants